CN113867627A - Method and system for optimizing performance of storage system - Google Patents

Method and system for optimizing performance of storage system Download PDF

Info

Publication number
CN113867627A
CN113867627A CN202110999485.XA CN202110999485A CN113867627A CN 113867627 A CN113867627 A CN 113867627A CN 202110999485 A CN202110999485 A CN 202110999485A CN 113867627 A CN113867627 A CN 113867627A
Authority
CN
China
Prior art keywords
data
hash value
physical address
written data
written
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110999485.XA
Other languages
Chinese (zh)
Other versions
CN113867627B (en
Inventor
甄凤远
刘志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110999485.XA priority Critical patent/CN113867627B/en
Publication of CN113867627A publication Critical patent/CN113867627A/en
Application granted granted Critical
Publication of CN113867627B publication Critical patent/CN113867627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system for optimizing the performance of a storage system, the method comprising: receiving write data and a logic address thereof, and calculating a hash value of the write data; verifying whether the written data is new data according to the hash value; responding to the fact that the written data are new data, writing the written data into a disk and acquiring a physical address of the written data on the disk; and forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table. By the method and the system, the original LP, PL and HP are modified into LH, HP and PL, and the mechanism of data deduplication is changed from keeping old data and prohibiting writing new data into continuously writing new data and deleting the old data according to the value of the reference count. The ordering of the PL tables is maintained, and the PL can be increased in a balanced manner through a certain treeing strategy. PL insertion and querying eliminates the need for additional overflow page handling, reducing system overhead.

Description

Method and system for optimizing performance of storage system
Technical Field
The invention belongs to the field of computer storage, and particularly relates to a method and a system for optimizing the performance of a storage system.
Background
At present, most of full flash memories can support the function of deleting repeated data on line in order to improve the overall utilization rate of a storage system, and more host data can be stored on the premise of equal capacity by reducing the storage of the repeated data.
In order to support advanced storage characteristics such as deduplication, most storage systems adopt a global data additional writing mode, that is, newly written data is always additionally written onto a disk in a backward sequence. In order to support the deduplication property, the full flash storage needs to introduce metadata ((metadata)), which generally includes LP (mapping from logical address to physical address), PL (mapping from physical address to logical address), and HP (mapping from hash value of data block to physical address), so that when a data block is issued, a hash fingerprint value is first calculated through the data block, if HP does not exist, a new copy of data is written to the physical address P, at this time, three metadata of LP, PL, and HP are issued, and finally, the three metadata are all persisted in a manner of falling on a disk; if the new written data query exists for HP, then no more new data is written, only LP and PL are written. Under the condition of high deduplication rate, the probability of P in PL is discontinuous, because multiple HP hits cause load imbalance of multithreading brushing at the moment, and performance is reduced; in addition, in the case of high deduplication rate, multiple same ps may correspond to multiple ls in PL, so that it is difficult for B + tree to handle this, and extra overflow page processing may occur.
Therefore, a new solution is needed to address this problem.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and a system for optimizing the performance of a storage system, wherein the method comprises:
receiving write data and a logic address thereof, and calculating a hash value of the write data;
verifying whether the written data is new data according to the hash value;
responding to the fact that the written data are new data, writing the written data into a disk and acquiring a physical address of the written data on the disk;
and forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table.
In some embodiments of the present invention, the writing the logical address and the hash value of the write data into the corresponding table as metadata comprises:
composing the logic address and the physical address of the written data into an LH parameter of metadata and storing the LH parameter into an LH table;
forming an HP parameter of metadata by the hash value and the physical address of the written data and storing the HP parameter into an HP table; and
and forming a PL parameter of the metadata by the physical address and the logical address of the written data and storing the PL parameter into a PL table.
In some embodiments of the present invention, composing the hash value and the physical address of the write data into an HP parameter of the metadata and storing the HP parameter in an HP table includes:
and merging the times of storing the written data into the disk as reference counts at the end of the physical address and the hash value to form an HP parameter.
In some embodiments of the invention, verifying whether the write data is new data according to the hash value comprises:
searching in an HP table according to the hash value;
responding to the HP table that the hash value and the physical address corresponding to the hash value exist, and determining that the written data is old data;
and responding to the fact that the hash value and the physical address corresponding to the hash value do not exist in the HP table, and the written data is new data.
In some embodiments of the invention, the method further comprises:
responding to the fact that the written data are old data, writing the written data into a disk and acquiring a new physical address of the written data on the disk; and
obtaining a reference count of an end of an old physical address of the write data from the HP table;
adding one to the value of the reference count and merging the value to the end of the new physical address, and forming a new HP parameter together with the hash value of the written data and storing the new HP parameter into an HP table; and
and storing the old physical address into a corresponding garbage collection table, and cleaning corresponding data in the garbage collection table at preset time intervals according to the physical address in the garbage collection table.
In some embodiments of the invention, the method further comprises:
and responding to the completion of the LH parameter storage into the LH table and the HP parameter storage into the HP table, and returning a data writing success signal to the host.
In some embodiments of the invention, the method further comprises:
responding to a data query request initiated by a host terminal, and analyzing a logical address in the query request;
searching a hash value corresponding to the logical address in an LH table according to the logical address in the data query request; and
searching a physical address corresponding to the hash value in an HP table according to the hash value;
and acquiring the physical address, and returning the data content of the physical address in the data space corresponding to the disk to the host side.
In some embodiments of the invention, the method further comprises:
the number of times of storing the written data into the disk is used as a reference count;
and composing a reference count parameter HN according to the hash value of the written data and the reference count and storing the reference count parameter HN in a reference count HN table.
In some embodiments of the invention, the method further comprises:
and responding to the host end to initiate a data deletion request, and analyzing the logical address in the request. (ii) a
According to the logical address in the data deleting request, searching a hash value corresponding to the logical address in an LH table, and searching a reference count value corresponding to the hash value in a reference count HN table;
in response to the reference count value not being 0, decrementing the reference count value corresponding to the hash value by one;
and in response to the reference count value being 0, searching a physical address corresponding to the hash value in an HP table according to the hash value, and deleting data in a disk corresponding to the physical address.
In another aspect of the present invention, a system for optimizing performance of a storage system is further provided, including:
the receiving module is configured to receive write data and a logical address thereof, and calculate a hash value of the write data;
a verification module configured to verify whether the written data is new data according to the hash value;
the processing module is configured to respond that the written data is new data, write the written data into a disk and acquire a physical address of the written data on the disk;
and the execution module is used for forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table.
According to the method and the system for optimizing the performance of the storage system, the original LP, PL and HP are modified into LH, HP and PL, and the mechanism of data deduplication is changed from keeping old data and prohibiting writing of new data into continuously writing of new data and deleting of the old data according to the numerical value of reference counting. The ordering of the PL tables is maintained, and the PL can be increased in a balanced manner through a certain treeing strategy. And the insertion and query of PL do not need extra overflow page processing, thus reducing system overhead, and although the data volume of the lower disk is increased to a certain extent, because the stripe is needed to be broken when the raid is written, if the stripe is too small, meaningless write amplification can be generated, so that the new written data can not cause too large performance influence each time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of one embodiment of a method for optimizing performance of a storage system in accordance with the present invention;
FIG. 2 is a flow chart illustrating a method for optimizing the performance of a storage system according to an embodiment of the present invention;
FIG. 3 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 4 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 5 is a flow chart of one embodiment of a method for optimizing performance of a storage system according to the present invention;
FIG. 6 is a flow chart illustrating a method for optimizing performance of a storage system according to an embodiment of the present invention;
fig. 7 is a system configuration diagram of a system for optimizing performance of a storage system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In a first aspect of the embodiments of the present invention, an embodiment of a method for optimizing performance of a storage system is provided, and fig. 1 is a flowchart illustrating an embodiment of a method for optimizing performance of a storage system according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
step S101, receiving write data and a logic address thereof, and calculating a hash value of the write data;
step S102, verifying whether the written data is new data or not according to the hash value;
step S103, responding to the fact that the written data are new data, writing the written data into a disk and acquiring a physical address of the written data on the disk;
and step S104, forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table.
In the invention, the problem to be solved by the invention is the data management problem of a large-scale storage system taking a solid state disk as a main storage medium. In the prior art, based on a special manner (compared with a mechanical hard disk) that a logical address of stored data of a solid state disk corresponds to a physical address, the effect of data deduplication (deduplication, data is theoretically not duplicated in the solid state disk) can be realized in a storage layer, that is, by binding the logical address or the physical address of the data with a characteristic value (hash value) of the data, mapping of storage of the data can be realized, and data uniqueness can also be maintained.
This technology is realized because a data processing technology of Metadata (Metadata), which is also called intermediary data, relay data, is introduced, the Metadata is not data itself or a basis of the data, but data describing the data, and in a storage field, a set of data composed of a plurality of key value pairs describing an actual location (physical address) of the stored data and an identification or index (logical address) of the data.
In particular, metadata generally includes three parameters: LP, L denotes the logical address of the data and P denotes the physical address of the data on the disk. A logical address is the identification of data by the operating system or some application on the host. The physical address is a memory location address at a hardware layer of the solid state disk. LP represents a key-value pair consisting of a logical address L as a key and a physical address P as a value; in addition, there is PL, i.e. a key-value pair consisting of a key at physical address P and a value at logical address L; the key parameter HP, H for solving the problem of data duplicate storage represents a hash value of data, that is, a key value pair consisting of a hash value of data as a key and a physical address of data as a value.
Therefore, when data is written in a solid state, the writing or insertion of the three metadata is realized besides storing the data to the corresponding physical address on the disk, and the LP, PL and HP in the metadata are all realized by the data structure of the B + tree, so that the corresponding metadata can be searched at a very high speed.
As described above, in the prior art, when storing new data, it is necessary to perform hash calculation on the data to obtain a hash value of the data, and query the hash value in the HP table, if a hash value H of a certain HP in the HP table is the same as a hash value of the data to be written, it means that the data has already been stored on a disk, according to a deduplication mechanism, the data is prevented from being written, and a P in the HP table, where a HP key value pair is found, is used as a physical address of the data to be written to constitute LP metadata of the data to be written, that is, metadata is constructed for the data to be written: LP, PL and HP are stored in corresponding LP table, PL table and HP table.
However, in some cases, the written data is largely duplicated, and according to the above deduplication mechanism, hash values of a large amount of written data hit corresponding HP data (key value pairs) in the HP table, so that the physical address P always uses the physical address of old data, and when constructing the PL of metadata, the logical address L is always increased, and the physical address P corresponds to a plurality of logical addresses L, so that the physical addresses in the PL table are largely discontinuous. And with the increase of data, the records of the PL table are increased all the time, and the data in the HP table is changed less due to a large amount of re-deleted data, so that the lookup speed of the PL table is gradually lower than that of the HP table, and further the problem of load imbalance when a storage task is executed in multiple threads is caused, namely, the query to the HP is completed, and the malformation of the data structure in the PL table causes the efficiency to be reduced and the query process is still not completed. Ultimately affecting overall data storage performance.
Therefore, in order to solve the above problems, the present invention structurally changes the existing metadata, i.e., changes LP, PL, and HP into LH, HP, and PL. The method comprises the following specific steps:
in step S101, the storage system receives a write data request from the host, where the write data request includes write data and a corresponding logical address, and the logical address is an identifier of the write data issued by a corresponding application or system on the operating system of the host. After the storage system receives the write data, performing hash operation on the write data to obtain a hash value of the write data.
In step S102, after the hash value of the write data is calculated in step S101, it is required to search in the HP table through the hash value to verify whether the write data is new data;
in step S103, if it is verified in step S102 that the write data is new data, writing the write data to a disk, and acquiring a physical address of the write data on the disk from a corresponding writing tool;
in step S104, the logical address and the hash value of the write data are used as parameters LH of key-value pairs constituting metadata, i.e. LH: { "L": P "}, which in practical embodiment storage systems is typically {" 0x184499d5f4bce82 ": "e 10adc3949ba59abbe56e057f20f883 e" } and inserts the metadata parameter LH of the write data into the corresponding LH table.
In some embodiments of the present invention, the composing the logical address and the hash value of the write data into metadata and writing into the corresponding table includes:
composing the logic address and the physical address of the written data into an LH parameter of metadata and storing the LH parameter into an LH table;
forming an HP parameter of metadata by the hash value and the physical address of the written data and storing the HP parameter into an HP table; and
and forming a PL parameter of the metadata by the physical address and the logical address of the written data and storing the PL parameter into a PL table.
In this embodiment, metadata for describing or indexing the write data is constructed according to the physical address, the logical address and the hash value of the write data, and is used for establishing mapping between a host and a real physical address of data, specifically, an LH parameter of the metadata is composed of the logical address and the physical address of the write data and is stored in an LH table, and a mapping relationship from the logical address to the hash value is established, because the hash value is obtained by calculating the write data and has uniqueness, the uniqueness from the logical address to the write data can be ensured by means of LH; the hash value of the written data and the physical address of the written data form HP parameters of metadata, and mapping is established between the hash value of the written data and the physical address so as to realize uniqueness during data storage, namely, repeated data does not exist in a disk, namely, the condition that one hash value H corresponds to a plurality of physical addresses P does not occur; and forming the physical address and the logical address of the written data into a PL parameter of the metadata, and ensuring that the written physical address P is always kept to be updated synchronously with the logical address of the newly written data.
In some embodiments of the present invention, composing the hash value and the physical address of the write data into an HP parameter of the metadata and storing the HP parameter in an HP table includes:
and merging the times of storing the written data into the disk as reference counts at the end of the physical address and the hash value to form an HP parameter.
In this embodiment, according to the inventive concept of the present invention of continuously writing data to make physical addresses continuous, there is also a problem that: that is, if the newly written data is already stored in the disk, but according to the write strategy of the present invention, the newly written data will be written to the disk sequentially (in order to ensure the continuity of P), and in the metadata, the hash value of the written data will be updated at the physical address P in the HP table. There is a case where data is deleted by mistake.
Specifically, suppose that at a certain time of the storage system, the host initiates a request for writing data to the storage system, and sends data M with a data size of 5MB and a logical address L1 corresponding to M to the storage system. The storage system carries out hash calculation on the data M to obtain a hash value H of the data M, inquires the HP table according to the hash value H, finds that the data does not exist in the HP table, stores the data M into a place with a physical address P1 in a disk, and establishes metadata LH, HP and PL for the data M. After a certain time, the storage system sends a write data request again, the data in the request is still M with a data size of 5MB and the logical address is L2, the hash value H is obtained again after hash calculation, and the HP table is queried according to H and hits the previously stored HP, that is, the data disk has already been stored. At this point, instead of stopping writing data, the method according to the present invention continues writing data, and returns the new physical address P2 written to create a new HP parameter update HP in which physical address P1 is replaced with physical address P2, and updates the PL table to maintain continuity of the PL table. If so, the host issues a delete data request and requests to delete data at logical address L1 (delete data by deleting the space at the physical address corresponding to the logical address, rather than providing a single datum). However, in the above process, the corresponding hash value of L1 in the LH table and the corresponding hash value of L2 are both the hash value H of the data M, and the physical address found when the corresponding physical address of the data M is looked up in the HP table according to H is P2, because P1 has been replaced by P2. Therefore, if P2 is deleted at this time, the data corresponding to logical address L2 is invalidated (deleted).
Therefore, in order to solve this problem, in this embodiment, when writing data, in constructing metadata HP corresponding to the write data for the write data, a reference count of the write data is added at the end of a physical address P, if the write data is stored in a disk for the first time (the corresponding HP is not searched in the HP table by H), the reference count of the write data is set to 1, if the write data is not stored in the disk for the first time, a value of the reference count at the end of the physical address in the corresponding HP is acquired, and the reference count is updated to the corresponding HP after being added by 1.
In some embodiments of the present invention, the merging of the reference count and the physical address is performed by converting the values of the physical address and the reference count into corresponding character strings for merging, and converting the data into integer for performing digital logic addition and subtraction operations when the reference count is used.
In some embodiments of the present invention, the reference count is merged with the physical address in the form of a data bitmap, by merging the physical address and the bitmap corresponding to the reference count.
As shown in fig. 2, in some embodiments of the present invention, verifying whether the write data is new data according to the hash value comprises:
s401, searching in an HP table according to the hash value;
step S402, responding to the hash value and the physical address corresponding to the hash value in the HP table, wherein the written data is old data;
step S403, in response to that the hash value and the physical address corresponding to the hash value do not exist in the HP table, the write data is new data.
In this embodiment, the verification process of whether the written data is new written data is performed according to the present invention, and the hash value of the written data is searched in the HP table of the metadata, as described above, in the HP table under the concept of the present invention, only 1 physical address P corresponding to the hash value H is provided, and the corresponding physical address P and the reference count thereof are continuously updated along with the change of repeated written data. Therefore, based on the above storage mechanism, the steps when verifying whether the data is new data are as follows:
in step S401, the storage system calls a corresponding query API to retrieve the data in the HP, and transfers the hash value of the written data to the API interface. It should be noted that, in the present invention, the HP, LH, and PL tables are all stored in the memory, and may be non-relational databases, or B + tree data structures implemented based on the memory, so that it can respond very quickly when looking up metadata tables such as HP, etc., because the key L in the key H, LH table of the HP of the metadata table and the key P in the PL table are continuous when the metadata is changed into LH, HP, and PL in the present invention. Therefore, in step S401, it is only necessary to call the corresponding API to quickly find out whether there is an HP key-value pair with the same hash value as the written data in the HP table.
In step 402, if the result returned from the API contains 1 set of key value pairs HP, the write data is treated as old data, i.e., data that already exists.
In step S403, if the result returned from the API is None or null, the write data is regarded as new data, that is, the corresponding data is not stored in the disk.
As shown in fig. 3, in some embodiments of the invention, the method further comprises:
step S501, responding to the fact that the written data are old data, writing the written data into a disk and acquiring a new physical address of the written data on the disk; and
step S502, obtaining the reference count of the end of the old physical address of the write data from the HP table;
step S503, adding and merging the value of the reference count to the end of the new physical address, and forming a new HP parameter with the hash value of the written data and storing the new HP parameter into an HP table; and
step S504, the old physical address is stored into a corresponding garbage collection table, and corresponding data in the garbage collection table is cleaned at preset time intervals according to the physical address in the garbage collection table.
In this embodiment, in step S501, if the verified write data is old data, the write data is continuously stored in the disk and the physical address of the write data on the disk is obtained through the corresponding API;
in step S502, the reference count is separated from the physical address according to the merging manner of the reference count and the physical address, and if the reference count and the physical address are merged in a character string manner, the reference count is separated and converted into integer data by a corresponding character string segmentation method; and if the reference count is combined by the bitmap data, acquiring the corresponding bitmap data by inquiring the corresponding bitmap data through the reference count after the physical address.
In step S503, the acquired reference count is added by 1 and merged with the new physical address, and the merged new physical address and reference count are used as the value of the metadata HP for writing data, and the hash value of the writing data is stored again in the HP table as the key of the metadata HP.
In step 504, since the physical address of the HP in the HP table is changed in the process, and the same data as the write data pointed by the old physical address still exists in the disk, the data in the physical storage space executed by the old physical address is cleared to improve the space utilization. The garbage data may be cleared by the storage system every 30 seconds by placing the old physical address in the corresponding garbage collection table.
In some embodiments of the invention, the method further comprises:
and responding to the completion of the LH parameter storage into the LH table and the HP parameter storage into the HP table, and returning a data writing success signal to the host.
In this embodiment, the metadata has three data parameters, i.e. LH, HP and PL, and operations on LH, HP and PL are executed by different threads, and in practice, due to continuous writing of data, the time for the three threads to process the same set of data is different. To reduce the latency response time of the host. The storage performance between the storage system and the host is improved.
As shown in fig. 4, in some embodiments of the invention, the method further comprises:
step S701, responding to a data query request initiated by a host end, and analyzing a logic address in the query request;
step S702, according to the logical address in the data query request, searching a hash value corresponding to the logical address in an LH table; and
step S703, searching a physical address corresponding to the hash value in an HP table according to the hash value;
step S704, obtaining the physical address, and returning the data content of the physical address in the data space corresponding to the disk to the host.
In this embodiment, on the basis of changing the metadata structure, the method is implemented by the following steps when accessing data in a disk:
in step S701, the storage system maintains communication with the host in real time, and when the host issues a query request, obtains a logical address in the query lightweight;
in step S702, the logical address in the query request is used as a parameter, and the corresponding API is called to find out the hash value corresponding to the logical address in the LH table. And if the data of the corresponding logical address and the hash value cannot be found, returning error information to the host.
In step S703, if the hash value corresponding to the logical address is found, the physical address corresponding to the hash value is searched in the HP table according to the hash value;
in step S704, if the corresponding physical address is obtained in step S703, the data of the storage space to which the physical address points is returned to the host according to the physical address.
As shown in fig. 5, in some embodiments of the invention, the method further comprises:
step S801, taking the number of times of storing the written data into a disk as a reference count;
step S802, forming a reference count parameter HN according to the hash value of the written data and the reference count, and storing the reference count parameter HN in a reference count HN table.
In this embodiment, in order to provide a statistical task of the data deduplication rate under the condition of preventing the data from being deleted by mistake during data deletion, the reference count and the hash value are bound and stored in the corresponding data table in this embodiment. And taking the storage times of the written data as reference counts, taking the hash values of the written data as keys, taking the reference counts corresponding to the written data as value pairs HN for forming the hash values and the reference counts, establishing a reference count table HN for HN, and writing different HNs of the written data into the HN table.
As shown in fig. 6, in some embodiments of the invention, the method further comprises:
step S901, responding to a request for deleting data initiated by a host end, and analyzing a logic address in the request;
step S902, according to the logical address in the data deletion request, searching a hash value corresponding to the logical address in an LH table, and searching a reference count value corresponding to the hash value in the reference count HN table;
step S903, in response to the reference count value not being 0, subtracting one from the reference count value corresponding to the hash value;
step S904, in response to that the value of the reference count is 0, searching for a physical address corresponding to the hash value in an HP table according to the hash value, and deleting data in a disk corresponding to the physical address.
In this embodiment, in step S901, after the storage system receives a request for deleting data sent from the host, the storage system analyzes the data and acquires a logical address of the data to be deleted in the request.
In step S902, the corresponding API is called according to the logical address to query the logical address and the key-value pair of the hash value stored in the LH table, and obtain the corresponding hash value of the logical address, further query the value of the reference count corresponding to the hash value in the HN table,
in step S903, it is determined whether the value of the reference count corresponding to the hash value satisfies the deletion condition, and if the value of the reference count corresponding to the hash value is not 0, the value of the reference count is decremented by one and updated to the HN table.
In step S904, if the reference count value corresponding to the hash value is 0, it represents that the data corresponding to the current hash value corresponds to only one logical address, and when the host deletes the data corresponding to the logical address, the data can be directly cleared.
As shown in fig. 7, another aspect of the present invention further provides a system for optimizing performance of a storage system, including:
the receiving module 1 is configured to receive write data and a logical address thereof, and calculate a hash value of the write data;
a verification module 2, wherein the verification module 2 is configured to verify whether the written data is new data according to the hash value;
the processing module 3 is configured to, in response to that the write-in data is new data, write the write-in data into a disk and acquire a physical address of the write-in data on the disk;
and the execution module 4, wherein the execution module 4 makes up the logical address and the hash value of the written data into metadata and writes the metadata into a corresponding table.
According to the method and the system for optimizing the performance of the storage system, the original LP, PL and HP are modified into LH, HP and PL, and a data deduplication mechanism is changed from keeping old data and prohibiting writing new data into continuously writing new data and deleting the old data according to the numerical value of reference counting. The ordering of the PL tables is maintained, and the PL can be increased in a balanced manner through a certain treeing strategy. And the insertion and query of PL do not need extra overflow page processing, thus reducing system overhead, and although the data volume of the lower disk is increased to a certain extent, because the stripe is needed to be broken when the raid is written, if the stripe is too small, meaningless write amplification can be generated, so that the new written data can not cause too large performance influence each time.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for optimizing performance of a storage system, comprising:
receiving write data and a logic address thereof, and calculating a hash value of the write data;
verifying whether the written data is new data according to the hash value;
responding to the fact that the written data are new data, writing the written data into a disk and acquiring a physical address of the written data on the disk;
and forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table.
2. The method of claim 1, wherein the composing the logical address and the hash value of the write data into metadata and writing into the corresponding table comprises:
composing the logic address and the physical address of the written data into an LH parameter of metadata and storing the LH parameter into an LH table;
forming an HP parameter of metadata by the hash value and the physical address of the written data and storing the HP parameter into an HP table; and
and forming a PL parameter of the metadata by the physical address and the logical address of the written data and storing the PL parameter into a PL table.
3. The method of claim 2, wherein storing the hash value and the physical address of the write data into an HP parameter of the metadata comprises:
and merging the times of storing the written data into the disk as reference counts at the end of the physical address and the hash value to form an HP parameter.
4. The method of claim 3, wherein the verifying whether the written data is new data according to the hash value comprises:
searching in an HP table according to the hash value;
responding to the HP table that the hash value and the physical address corresponding to the hash value exist, and determining that the written data is old data;
and responding to the fact that the hash value and the physical address corresponding to the hash value do not exist in the HP table, and the written data is new data.
5. The method of claim 4, further comprising:
responding to the fact that the written data are old data, writing the written data into a disk and acquiring a new physical address of the written data on the disk; and
obtaining a reference count of an end of an old physical address of the write data from the HP table;
adding one to the value of the reference count and merging the value to the end of the new physical address, and forming a new HP parameter together with the hash value of the written data and storing the new HP parameter into an HP table; and
and storing the old physical address into a corresponding garbage collection table, and cleaning corresponding data in the garbage collection table at preset time intervals according to the physical address in the garbage collection table.
6. The method of claim 5, further comprising:
and responding to the completion of the LH parameter storage into the LH table and the HP parameter storage into the HP table, and returning a data writing success signal to the host.
7. The method of claim 2, further comprising:
responding to a data query request initiated by a host terminal, and analyzing a logical address in the query request;
searching a hash value corresponding to the logical address in an LH table according to the logical address in the data query request; and
searching a physical address corresponding to the hash value in an HP table according to the hash value;
and acquiring the physical address, and returning the data content of the physical address in the data space corresponding to the disk to the host side.
8. The method of claim 2, further comprising:
the number of times of storing the written data into the disk is used as a reference count;
and composing a reference count parameter HN according to the hash value of the written data and the reference count and storing the reference count parameter HN in a reference count HN table.
9. The method of claim 8, further comprising:
responding to a data deleting request initiated by a host end, and analyzing a logic address in the request;
according to the logical address in the data deleting request, searching a hash value corresponding to the logical address in an LH table, and searching a reference count value corresponding to the hash value in a reference count HN table;
in response to the reference count value not being 0, decrementing the reference count value corresponding to the hash value by one;
and in response to the reference count value being 0, searching a physical address corresponding to the hash value in an HP table according to the hash value, and deleting data in a disk corresponding to the physical address.
10. A storage system performance optimization system, comprising:
the receiving module is configured to receive write data and a logical address thereof, and calculate a hash value of the write data;
a verification module configured to verify whether the written data is new data according to the hash value;
the processing module is configured to respond that the written data is new data, write the written data into a disk and acquire a physical address of the written data on the disk;
and the execution module is used for forming metadata by the logic address and the hash value of the written data and writing the metadata into a corresponding table.
CN202110999485.XA 2021-08-29 2021-08-29 Storage system performance optimization method and system Active CN113867627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110999485.XA CN113867627B (en) 2021-08-29 2021-08-29 Storage system performance optimization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110999485.XA CN113867627B (en) 2021-08-29 2021-08-29 Storage system performance optimization method and system

Publications (2)

Publication Number Publication Date
CN113867627A true CN113867627A (en) 2021-12-31
CN113867627B CN113867627B (en) 2023-08-22

Family

ID=78988656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110999485.XA Active CN113867627B (en) 2021-08-29 2021-08-29 Storage system performance optimization method and system

Country Status (1)

Country Link
CN (1) CN113867627B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115437579A (en) * 2022-11-04 2022-12-06 苏州浪潮智能科技有限公司 Metadata management method and device, computer equipment and readable storage medium
CN117271224A (en) * 2023-11-14 2023-12-22 苏州元脑智能科技有限公司 Data repeated storage processing method and device of storage system, storage medium and electronic equipment
WO2024119797A1 (en) * 2022-12-07 2024-06-13 苏州元脑智能科技有限公司 Data processing method and system, device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183399A (en) * 2015-09-30 2015-12-23 北京奇艺世纪科技有限公司 Data writing and reading method and device based on elastic block storage
CN106095332A (en) * 2016-06-01 2016-11-09 杭州宏杉科技有限公司 A kind of data heavily delete method and device
CN109189349A (en) * 2018-10-16 2019-01-11 深圳忆联信息系统有限公司 A kind of method and its system promoting solid state hard disk copy function
CN109407985A (en) * 2018-10-15 2019-03-01 郑州云海信息技术有限公司 A kind of method and relevant apparatus of data management

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183399A (en) * 2015-09-30 2015-12-23 北京奇艺世纪科技有限公司 Data writing and reading method and device based on elastic block storage
CN106095332A (en) * 2016-06-01 2016-11-09 杭州宏杉科技有限公司 A kind of data heavily delete method and device
CN109407985A (en) * 2018-10-15 2019-03-01 郑州云海信息技术有限公司 A kind of method and relevant apparatus of data management
CN109189349A (en) * 2018-10-16 2019-01-11 深圳忆联信息系统有限公司 A kind of method and its system promoting solid state hard disk copy function

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115437579A (en) * 2022-11-04 2022-12-06 苏州浪潮智能科技有限公司 Metadata management method and device, computer equipment and readable storage medium
CN115437579B (en) * 2022-11-04 2023-03-24 苏州浪潮智能科技有限公司 Metadata management method and device, computer equipment and readable storage medium
WO2024093090A1 (en) * 2022-11-04 2024-05-10 苏州元脑智能科技有限公司 Metadata management method and apparatus, computer device, and readable storage medium
WO2024119797A1 (en) * 2022-12-07 2024-06-13 苏州元脑智能科技有限公司 Data processing method and system, device, and storage medium
CN117271224A (en) * 2023-11-14 2023-12-22 苏州元脑智能科技有限公司 Data repeated storage processing method and device of storage system, storage medium and electronic equipment
CN117271224B (en) * 2023-11-14 2024-02-20 苏州元脑智能科技有限公司 Data repeated storage processing method and device of storage system, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113867627B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
US20240012714A1 (en) Indirect Dataset Replication To Cloud-Based Targets
CN108810041B (en) Data writing and capacity expansion method and device for distributed cache system
CN113867627B (en) Storage system performance optimization method and system
CN107491523B (en) Method and device for storing data object
US10331641B2 (en) Hash database configuration method and apparatus
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
WO2017020576A1 (en) Method and apparatus for file compaction in key-value storage system
CN109063192B (en) Working method of high-performance mass file storage system
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
CN113626431A (en) LSM tree-based key value separation storage method and system for delaying garbage recovery
WO2013075306A1 (en) Data access method and device
US11461239B2 (en) Method and apparatus for buffering data blocks, computer device, and computer-readable storage medium
WO2020215580A1 (en) Distributed global data deduplication method and device
CN114610708A (en) Vector data processing method and device, electronic equipment and storage medium
CN115718819A (en) Index construction method, data reading method and index construction device
CN115114294A (en) Self-adaption method and device of database storage mode and computer equipment
CN110618790A (en) Mist storage data redundancy removing method based on repeated data deletion
WO2023246754A1 (en) Data deduplication method and related system
WO2024119797A1 (en) Data processing method and system, device, and storage medium
CN115437579B (en) Metadata management method and device, computer equipment and readable storage medium
EP4394575A1 (en) Data processing method and storage system
WO2022267508A1 (en) Metadata compression method and apparatus
CN116204130A (en) Key value storage system and management method thereof
CN113326262B (en) Data processing method, device, equipment and medium based on key value database
CN113867626A (en) Method, system, equipment and storage medium for optimizing performance of storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant