WO2023029417A1 - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
WO2023029417A1
WO2023029417A1 PCT/CN2022/078858 CN2022078858W WO2023029417A1 WO 2023029417 A1 WO2023029417 A1 WO 2023029417A1 CN 2022078858 W CN2022078858 W CN 2022078858W WO 2023029417 A1 WO2023029417 A1 WO 2023029417A1
Authority
WO
WIPO (PCT)
Prior art keywords
physical address
data
address
virtual address
storage space
Prior art date
Application number
PCT/CN2022/078858
Other languages
French (fr)
Chinese (zh)
Inventor
蒲贵友
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023029417A1 publication Critical patent/WO2023029417A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the embodiments of the present application relate to the field of storage technologies, and in particular, to a data storage method and device.
  • the second data when writing the second data into a certain hard disk, if the second data is exactly the same as the first data stored in the hard disk, the second data will not be repeatedly written in the hard disk, but will be repeated.
  • the data deletion technology points the virtual address corresponding to the second data to the physical address corresponding to the first data, thereby avoiding repeated writing of the same data in the hard disk and saving the storage space of the hard disk.
  • the data (that is, the first data) may be read according to the pointing relationship between the virtual address corresponding to the second data and the physical address corresponding to the first data.
  • An existing data storage method based on deduplication technology is: in the process of writing data to the hard disk, if the data corresponding to multiple virtual addresses is the same (for example, all are data A), then the multiple virtual addresses All point to the physical address of the storage block storing data A, and record the physical address, the virtual address corresponding to the physical address, and the number of virtual addresses in the reverse mapping table (the number of the virtual addresses is called a reference count).
  • Embodiments of the present application provide a data storage method and device capable of efficiently obtaining reference counts and virtual address sets corresponding to storage blocks.
  • an embodiment of the present application provides a data storage method, the method including: determining a first physical address corresponding to the first data, where the first physical address is used to indicate at least one storage block in the first storage space; The above-mentioned first data is written into the first physical address; the mapping from the above-mentioned first virtual address to the above-mentioned first physical address is recorded in the forward mapping table, wherein the virtual address of the first data is the first virtual address, and the above-mentioned first physical address The address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space; then, record the first physical address in the meta-information table corresponding to the first storage space, and record the first physical address in the meta-information table A first virtual address corresponding to the first physical address is recorded.
  • the first data is written into the first physical address indicating at least one storage block in the first storage space, and the first data is recorded in the meta information table corresponding to the first storage space.
  • a virtual address (such as a first virtual address) corresponding to a physical address, wherein the first physical address includes an identifier of the first storage space and an offset address of the first physical address in the first storage space; so when it is necessary to obtain the first
  • obtain the meta-information table corresponding to the first storage space according to the identifier of the first storage space in the first physical address and then query the first physical address in the meta-information table according to the first physical address
  • the virtual address corresponding to the physical address and count the number of virtual addresses.
  • the above data storage method further includes: determining whether the second data is the same as the above first data, and the second data is data to be written; if the second data is the same as the above first data Next, record the mapping from the second virtual address to the first physical address in the forward mapping table; and record the second virtual address corresponding to the first physical address in the meta information table.
  • the data storage device when the data storage device writes data to the hard disk, it first determines whether the data is already written data, and if the data is already written data, then the A mapping relationship is established between the virtual address and the physical address corresponding to the data, and the data is not repeatedly written on the hard disk, thus saving the storage space of the hard disk.
  • the meta information table includes a virtual address set corresponding to the first physical address; the first virtual address and the second virtual address are added to the virtual address set.
  • the data storage method further includes: when performing garbage collection on the first storage space, writing the first data in the first physical address into the second physical address; Record the second physical address in the meta information table corresponding to the second storage space of the second physical address, and add the above-mentioned first virtual address and second virtual address to the virtual address set corresponding to the second physical address; record the second virtual address in the forward mapping table A mapping from the first virtual address to a second physical address and a mapping from the second virtual address to the second physical address; then, deleting the first data in the first physical address; and deleting the meta information of the first storage space The first physical address in the table and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address; and delete the mapping from the first virtual address to the first physical address in the forward mapping table and A mapping of the second virtual address to the first physical address.
  • the storage space in the embodiment of the present application may be a persistent log (Plog) storage space, and the persistent log storage space supports additional writing way to write data.
  • Plog persistent log
  • the embodiment of the present application provides a data storage device, including: a determination module, a data writing module, and an information recording module.
  • the determination module is used to determine the first physical address corresponding to the first data, and the first physical address is used to indicate at least one storage block in the first storage space;
  • the write data module is used to write the first data into the first physical address ;
  • the information recording module is used to record the mapping from the first virtual address to the first physical address in the forward mapping table, wherein the virtual address of the first data is the first virtual address, and the first physical address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space;
  • the information recording module is also used to record the first physical address in the meta-information table corresponding to the first storage space, and record the first physical address in the meta-information table The corresponding first virtual address.
  • the above determination module is also used to determine whether the second data is the same as the first data, and the second data is data to be written; the information recording module is also used to compare the second data with the first In the case of the same data, record the mapping from the second virtual address to the first physical address in the forward mapping table; the information recording module is also used to record the second virtual address corresponding to the first physical address in the meta information table. address.
  • the meta information table includes a virtual address set corresponding to the first physical address; the first virtual address and the second virtual address are added to the virtual address set.
  • the data storage device further includes: a deletion module.
  • the above-mentioned writing data module is also used for writing the first data in the first physical address into the second physical address when performing garbage collection on the first storage space; Record the second physical address in the meta information table corresponding to the second storage space, and add the first virtual address and the second virtual address in the virtual address set corresponding to the second physical address;
  • the information recording module is also used for forward mapping table Record the mapping from the first virtual address to the second physical address and the mapping from the second virtual address to the second physical address;
  • the deletion module is used to delete the first data in the first physical address, and delete the meta information of the first storage space The first physical address in the table and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address; and delete the mapping from the first virtual address to the first physical address and the first virtual address in the forward mapping table Mapping of the second virtual address to the first physical address.
  • the embodiment of the present application provides a storage device, wherein the memory is coupled to the processor; the memory is used to store computer program codes, wherein the computer program codes include computer instructions; when the computer instructions are executed by the processor, the storage
  • the device executes the method described in any one of the first aspect and possible implementations thereof.
  • an embodiment of the present application provides a computer storage medium, including computer instructions.
  • the computing device is made to execute the above-mentioned method described in any one of the first aspect and its possible implementations. method.
  • the embodiments of the present application provide a computer program product, which, when run on a computer, causes the computer to execute the method described in any one of the above first aspect and possible implementations thereof.
  • FIG. 1 is a first schematic diagram of the relationship between a virtual address and a physical address provided by an embodiment of the present application
  • FIG. 2 is a first schematic diagram of a storage system provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram 1 of a data storage method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a hard disk structure provided by an embodiment of the present application.
  • Fig. 5a is a schematic diagram 1 of writing data provided by the embodiment of the present application.
  • Figure 5b is a second schematic diagram of writing data provided by the embodiment of the present application.
  • Fig. 5c is a schematic diagram 1 of garbage collection provided by the embodiment of the present application.
  • Fig. 5d is a schematic diagram 1 of data deletion provided by the embodiment of the present application.
  • FIG. 6 is a second schematic diagram of a data storage method provided by an embodiment of the present application.
  • FIG. 7 is a third schematic diagram of a data storage method provided by the embodiment of the present application.
  • Fig. 8 is a schematic diagram 4 of a data storage method provided by the embodiment of the present application.
  • FIG. 9 is a first schematic diagram of a data storage device provided by an embodiment of the present application.
  • FIG. 10 is a second schematic diagram of a data storage device provided by an embodiment of the present application.
  • first and second in the description and claims of the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order of objects.
  • first physical address and the second physical address are used to distinguish different physical addresses, rather than describing a specific sequence of physical addresses.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.
  • the existing data storage method based on de-duplication technology is: when writing data in the hard disk (hereinafter the data to be written is referred to as target data A), the processor in the data storage system judges whether the hard disk has Storing the target data A, if there is no target data A in the hard disk, storing the target data A in the storage block corresponding to the physical address 1, and pointing the virtual address 1 corresponding to the target data to the physical address 1, At the same time, the mapping relationship between virtual address 1 and physical address 1 is recorded in the forward mapping table, specifically as shown in the first row of Table 1 below, wherein the forward mapping table includes at least fields: virtual address and physical address, and "virtual address The "address” field is used to store the virtual address corresponding to the specific data, and the "physical address” field is used to store the physical address pointed to by the virtual address; and the number of virtual addresses pointing to the physical address 1 is recorded in the reverse mapping table (abbreviated as reference count) and a virtual address set pointing to the physical address 1, the virtual address
  • the target data A is migrated from physical address 1 to physical address 2; at the same time, the reference count and virtual address set pointing to the physical address 1 are queried in the reverse mapping table, and The reference count and virtual address set corresponding to physical address 1 are updated to the reference count and virtual address set corresponding to physical address 2, and then, the virtual address in the virtual address set corresponding to physical address 1 in the reverse mapping table is redirected to the physical address 2. Finally, update the mapping relationship between the virtual address and physical address 1 to the mapping relationship between the virtual address and physical address 2 in the forward mapping table.
  • the embodiment of the present application provides a data storage method and device, specifically implemented as: determining the first physical address corresponding to the first data address, the first physical address is used to indicate at least one storage block in the first storage space; writing the first data into the first physical address; recording the mapping from the first virtual address to the first physical address in the forward mapping table, The virtual address of the first data is the first virtual address, and the first physical address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space; in the meta information table corresponding to the first storage space The first physical address is recorded, and the first virtual address corresponding to the first physical address is recorded in the meta information table.
  • the reference count and virtual address set corresponding to the storage block can be obtained efficiently.
  • the data storage method and device provided by the embodiments of the present application can be applied to the storage system shown in FIG. 2, and the storage system can be a storage system of a solid-state hard disk.
  • the data storage device includes a main controller (referred to as: main control) 201 and a plurality of flash memory chips 205, wherein the main control 201 includes: a processor 202, a host interface 204, and n (n>0) channel controllers 203.
  • the above-mentioned master control 201 is used to issue executable commands to multiple flash memory chips 205 , so as to realize the process of reading or writing data on the flash memory chips 205 .
  • the above-mentioned host interface 204 is used to communicate with the host, and then receive the command request sent by the host, and forward the command request to the processor 202, wherein the above-mentioned host is not limited to any device such as a server, a personal computer, or an array controller
  • the processor 202 sends executable commands to the plurality of flash memory particles 205 according to the command request forwarded by the host interface 204, and the processor 202 includes one or more CPUs.
  • the CPU may be a single-core CPU (single-CPU) or a multi-core CPU (multi-CPU).
  • the channel controller 203 is used to carry the executable commands issued by the processor 202 to the plurality of chips 205 .
  • the storage device further includes a bus 206, and the processor 202, the channel controller 203, the host interface 204, and the flash memory chip 205 are generally connected to each other through the bus 206, or are connected to each other in other ways.
  • the host interface 204 in the main control 201 forwards the data writing request to the processor 202 in the main control 201, and the processor 202 writes the request according to the data and passes
  • the channel controller 203 issues a data writing instruction to the flash memory chip 205 , and the writing of data into the flash memory chip 205 has been implemented.
  • the device for executing the data storage method provided by the embodiment of the present application may be the processor in the main controller in the storage system shown in FIG. 2 above. As shown in FIG. 3 , the method may include S310-S340.
  • the data storage device determines a first physical address corresponding to the first data.
  • first data is data not yet stored in the hard disk.
  • the above-mentioned first physical address is used to indicate at least one storage block in the first storage space, wherein the first storage space is to divide the hard disk into blocks with larger granularity, as shown in FIG. 4 , A hard disk is divided into 3 storage spaces, each row is a storage space, and each storage space includes N storage blocks, where N is greater than or equal to 2 and less than the total number of storage blocks in the hard disk.
  • the above-mentioned flash memory chip 205 in FIG. 2 is composed of multiple storage spaces, or the above-mentioned multiple flash memory chips 205 constitute one storage space, and the size division method of the storage space is not limited in this application.
  • the storage space may be a persistent log storage space (persistent log, Plog), where the Plog supports writing data in an append writing manner.
  • Plog persistent log storage space
  • the hard disk is divided into multiple Plogs, and each of the above storage spaces is equivalent to one Plog.
  • the above-mentioned first physical address may be calculated by the data storage device according to the first data, or may be randomly allocated by the memory, which is not limited in this embodiment of the present application.
  • the data storage device writes the first data into the first physical address.
  • first data may be stored on one storage block in the first storage space, or may be stored on multiple storage blocks in the first storage space, depending on the size of the first data and availability of the storage blocks. The amount of storage is determined.
  • the above-mentioned first physical address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space (that is, the storage block used to store the first data is in the first storage space The corresponding offset address in the space).
  • the first physical address is Plog id+offset, wherein, Plog id is the id corresponding to the Plog on the hard disk, and offset is used in the Plog
  • the offset address of the storage block storing the first data For example, the 100th storage block in the Plog is used to store the first data. At this time, the offset is 100.
  • the above-mentioned writing of the first data into at least one storage block in the first storage space indicated by the first physical address may specifically be: determine the first storage block according to the unique identifier (i.e. Plog id) of the first storage space in the first physical address A storage space, then, according to the offset address corresponding to the storage block in the first storage space in the first physical address, determine the storage block storing the first data; finally, write the first data into the storage block.
  • the unique identifier i.e. Plog id
  • the data storage device finds the Plog id according to the first physical address The Plog is 3, and then the 100th storage block is found from left to right in the Plog, and finally, the first data is written into the 100th storage block in the Plog.
  • the above-mentioned first physical address may also be a specific address on the hard disk consisting of multiple hexadecimal characters, and the storage location corresponding to the address is one or more storage blocks in the first storage space
  • the data storage device records the mapping from the first virtual address to the first physical address in the forward mapping table.
  • the above-mentioned first virtual address is the virtual address corresponding to the first data
  • the above-mentioned forward mapping table is used to record the mapping relationship between the virtual address and the physical address, wherein the virtual address points to the physical address, and the records of the forward mapping table can refer to Contents of Table 1 above.
  • a physical address has a mapping relationship with at least one virtual address, that is to say, in the forward mapping table, the relationship between the physical address and the virtual address can be one-to-many, or it can be One-to-one relationship.
  • the data storage device reads data corresponding to a certain virtual address
  • the physical address corresponding to the virtual address is searched in the forward mapping table, and then the data is read from the storage block corresponding to the physical address. data.
  • the data storage device records the first physical address in the meta information table corresponding to the first storage space, and records the first virtual address corresponding to the first physical address in the meta information table.
  • the above-mentioned meta information table is in one-to-one correspondence with the first storage space, that is, one storage space corresponds to one meta information table, and the identifier of the meta information table is the identifier of the storage space corresponding to the meta information table, that is to say According to the identification of the storage space, the meta information table corresponding to the storage space can be obtained; the above meta information table is used to record the physical address and the virtual address set pointing to the physical address, wherein the physical address and the virtual address set pointing to the physical address can be It is stored in the form of a key-value pair (key-value).
  • the key is used to record the physical address
  • the value corresponding to the key is used to record all virtual addresses pointing to the physical address.
  • virtual address 1 points to physical address 1
  • virtual address 10 points to physical address 10, wherein both physical address 1 and physical address 10 belong to the first storage space
  • the meta information table corresponding to the first storage space is as follows As shown in Table 5, physical address 1 corresponds to virtual address 1, and physical address 10 corresponds to virtual address 10.
  • the present application does not limit the execution sequence of S330 and S340 above, that is, the data storage device may execute S330 first and then S340, and the data storage device may also execute S340 first and then S330.
  • the first data is stored in at least one storage block in the first storage space indicated by the first physical address, and the first physical address is recorded in the meta information table corresponding to the first storage space. address and the virtual address corresponding to the first physical address, wherein the first physical address includes the identifier of the first storage space and the offset address of the storage block storing the first data in the first storage space; so when it is necessary to obtain the first
  • obtain the meta information table corresponding to the first storage space according to the identifier of the first storage space in the first physical address and then query the first physical address in the meta information table according to the first physical address address, and count the number of virtual addresses in the virtual address set to obtain the number of virtual addresses pointing to the first physical address.
  • the data storage method provided by the embodiment of the present application further includes: S610 -S630.
  • the data storage device determines whether the second data is the same as the first data.
  • the above-mentioned method for determining whether the second data is the same as the first data may be as follows: the data storage device calculates the fingerprint information corresponding to the second data (for example: the hash value corresponding to the second data) according to the second data, and then converts the second data to The corresponding fingerprint information is compared with the fingerprint information corresponding to the first data, if the fingerprint information corresponding to the second data is completely the same as the fingerprint information corresponding to the first data, then the second data is the same as the first data; If the fingerprint information is not completely the same as the fingerprint information corresponding to the first data, then the second data is different from the first data.
  • the data storage device calculates the fingerprint information corresponding to the second data (for example: the hash value corresponding to the second data) according to the second data, and then converts the second data to The corresponding fingerprint information is compared with the fingerprint information corresponding to the first data, if the fingerprint information corresponding to the second data is completely the same as the fingerprint information corresponding to the first data, then the second data
  • the method for determining whether the second data is the same as the first data is not limited.
  • the data storage device records the mapping from the second virtual address to the first physical address in the forward mapping table.
  • the second virtual address is a virtual address corresponding to the second data, and the second virtual address points to the first physical address.
  • the data storage device records the second virtual address corresponding to the first physical address in the meta information table.
  • the aforementioned recording of the second virtual address corresponding to the first physical address in the meta information table may be adding the second virtual address to the set of virtual addresses corresponding to the first physical address in the meta information table.
  • virtual address 1 points to physical address 1
  • virtual address 2 also points to physical address 1
  • virtual address 10 points to physical address 10, wherein both physical address 1 and physical address 10 belong to the first storage space, thus, the first
  • the meta information table corresponding to a storage space may be shown in Table 6 below, where physical address 1 corresponds to virtual address 1 and virtual address 2, and physical address 10 corresponds to virtual address 10.
  • the data storage device when the data storage device writes data to the hard disk, it is determined whether the data is written data, and if the data is written data, the virtual address corresponding to the data is compared with The physical address corresponding to the data establishes a mapping relationship, and the data is not repeatedly written on the hard disk, so that the storage space of the hard disk can be saved.
  • the data storage method provided in the embodiment of the present application is based on FIG. 6 , and as shown in FIG. 7 , the data storage method further includes: S710-S770.
  • the data storage device When performing garbage collection on the first storage space, the data storage device writes the first data in the first physical address into the second physical address.
  • At least one storage block in the second storage space indicated by the above-mentioned second physical address wherein the second physical address includes an identifier of the second storage space and an offset address of the second physical address in the second storage space;
  • first storage space and the second storage space are different storage spaces, that is to say, the above-mentioned second storage space is other storage spaces on the hard disk except the first storage space, which is not limited in this application. .
  • the method for writing the first data in the first physical address into at least one storage block in the second storage space indicated by the second physical address is as follows: according to the storage space in the first physical address The identification information determines the storage space corresponding to the first physical address (that is, the first storage space), and then, in the first storage space, determines the storage block storing the first data according to the offset address in the first physical address, and in the first storage space reading the first data from the storage block; finally, writing the read first data into at least one storage block in the second storage space indicated by the second physical address.
  • the above-mentioned method of writing the first data into the second physical address is similar to the method of writing the first data into the first physical address in S320 , for details, refer to the above detailed description, and details will not be repeated here.
  • the data storage device records the second physical address in the meta information table corresponding to the second storage space where the second physical address is located, and adds the first virtual address and the second virtual address to a virtual address set corresponding to the second physical address.
  • the meta information table corresponding to the first storage space is obtained according to the first physical address, and then the virtual address set pointing to the first physical address is obtained in the meta information table according to the first physical address, and then, in A method for recording the second physical address in the meta information table of the second storage space, and adding the first virtual address and the second virtual address to the virtual address set corresponding to the second physical address.
  • the data storage device records the mapping from the first virtual address to the second physical address in the forward mapping table.
  • the above method of recording the first virtual address and the second physical address in the forward mapping table can be similar to the method of recording the first virtual address and the first physical address in the forward mapping table in S330 above, and specifically refer to the above-mentioned embodiment Detailed description will not be repeated here.
  • the data storage device records the mapping from the second virtual address to the second physical address in the forward mapping table.
  • the data storage device deletes the first data in the first physical address.
  • the data storage device deletes the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address in the meta information table of the first storage space and the first physical address.
  • deletion of the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address means deleting all virtual addresses in the virtual address set corresponding to the first physical address.
  • the data storage device deletes the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table.
  • the meta information table corresponding to the second storage space is shown in Table 7 below, where virtual address 1 and virtual address 2 are respectively mapped to physical address 2 relation.
  • the data storage method provided by the embodiment of the present application further includes: S810-S830.
  • the data storage device deletes the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table.
  • the data storage device deletes the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address in the meta information table of the first storage space and the first physical address.
  • the above method of deleting the first physical address in the meta information table of the first storage space and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address is the same as the method of deleting the first storage space in S760 above.
  • the first physical address in the meta-information table is similar to the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address. For details, please refer to the above detailed description, which will not be repeated here.
  • the data storage device deletes the first data in the first physical address.
  • the embodiment of the present application provides a data storage device, the data storage device is used to execute the steps in the above data storage method, and the embodiment of the present application can divide the functional modules of the data storage device according to the above method example, For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the division of modules in the embodiment of this application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 9 shows a possible structural diagram of the data storage device involved in the above embodiment.
  • the data storage device includes: a determination module 901 , a data writing module 902 and an information recording module 903 .
  • the determining module 901 is configured to determine the first physical address corresponding to the first data, for example, execute step S310 in the above method embodiment.
  • the data writing module 902 is configured to write the first data into the first physical address, for example, execute step S320 in the above method embodiment.
  • the information recording module 903 is configured to record the mapping from the first virtual address to the first physical address in the forward mapping table, for example, execute step S330 in the above method embodiment.
  • the information recording module 903 is further configured to record the first physical address in the meta information table corresponding to the first storage space, and record the first virtual address corresponding to the first physical address in the meta information table corresponding to the first storage space, for example, execute Step S340 in the above method embodiment.
  • the determination module 901 is configured to determine whether the second data is the same as the first data, for example, execute step S610 in the above method embodiment.
  • the above information recording module 903 is also configured to record the mapping from the second virtual address to the first physical address in the forward mapping table when the second data is the same as the first data, for example, perform step S620 in the above method embodiment .
  • the above information recording module 903 is configured to record the second virtual address corresponding to the first physical address in the meta information table, for example, execute step S630 in the above method embodiment.
  • the embodiment of the present application provides a data storage device that further includes: a deletion module 904;
  • the data writing module 902 is further configured to write the first data in the first physical address into the second physical address when performing garbage collection on the first storage space, for example, execute step S710 in the above method embodiment.
  • the above-mentioned information recording module 903 is further configured to record the second physical address in the meta information table corresponding to the second storage space where the second physical address is located, and add the first virtual address and the first virtual address to the virtual address set corresponding to the second physical address.
  • Two virtual addresses for example, execute step S720 in the above method embodiment.
  • the above-mentioned information recording module 903 is also configured to record the mapping from the first virtual address to the second physical address and the mapping from the second virtual address to the second physical address in the forward mapping table, for example, execute steps S730, Step S740.
  • the deletion module 904 is configured to delete the first data in the first physical address, for example, execute step S750 in the above method embodiment.
  • the deletion module 904 is further configured to delete the first physical address in the meta information table of the first storage space and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address, for example, perform Step S760.
  • the deletion module 904 is also used to delete the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table, for example, execute step S770 and step S810 in the above method embodiment .
  • Each module of the above-mentioned data storage device can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be referred to the function description of the corresponding functional module, and will not be repeated here.
  • the data storage device includes: a processing module 1001 and a communication module 1002 .
  • the processing module 1001 is used to control and manage the actions of the data storage device, for example, to execute the steps performed by the determination module 901, the write data module 902, the information recording module 903 and the deletion module 904, and/or to execute the techniques described herein other processes.
  • the communication module 1002 is used to support the interaction between the data storage device and other devices. As shown in FIG. data and second data etc.
  • the processing module 1001 may be a processor or a controller, for example, the processor 202 in the main controller 201 in FIG. 2 .
  • the communication module 1002 may be a transceiver, an RF circuit, or a communication interface, etc., such as the host interface 204 in the main controller 201 and/or the channel controller 203 in the main controller 201 in FIG. 2 .
  • the storage module 1003 may be a memory, such as the memory chip 205 in FIG. 2 .
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • a software program it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, all or part of the processes or functions according to the embodiments of the present application will be generated.
  • the computer can be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored on or transmitted from one computer-readable disk to another computer-readable disk, for example, the computer instructions may be transmitted from a website site, computer, server, or data center by wire (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable hard disk may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .
  • a magnetic medium for example, a floppy disk, a magnetic disk, a magnetic tape
  • an optical medium for example, a digital video disc (digital video disc, DVD)
  • a semiconductor medium for example, a solid state drive (solid state drives, SSD)
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to realize the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable hard disk.
  • the essence of the technical solution of this application or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a hard disk.
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • a processor execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned hard disks include: various media capable of storing program codes such as flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.

Abstract

A data storage method, comprising: determining a first physical address corresponding to first data, the first physical address being used for indicating at least one storage block of a first storage space; writing the first data into the first physical address; recording the mapping from a first virtual address to the first physical address in a forward mapping table, a virtual address of the first data being the first virtual address, and the first physical address comprising an identifier of the first storage space and an offset address of the first physical address in the first storage space; and recording the first physical address in a meta information table corresponding to the first storage space, and recording the first virtual address corresponding to the first physical address in the meta information table. Also discloses is a data storage device.

Description

一种数据存储方法及装置A data storage method and device
本申请要求于2021年08月31日提交国家知识产权局、申请号为202111017305.X、申请名称为“一种数据存储方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office on August 31, 2021, with application number 202111017305.X and application name "A Data Storage Method and Device", the entire contents of which are incorporated by reference in In this application.
技术领域technical field
本申请实施例涉及存储技术领域,尤其涉及一种数据存储方法及装置。The embodiments of the present application relate to the field of storage technologies, and in particular, to a data storage method and device.
背景技术Background technique
众所周知,当向某硬盘中写入第二数据时,若第二数据与该硬盘中已存储的第一数据完全相同,则不再向该硬盘中重复写入该第二数据,而是采用重复数据删除技术,将第二数据对应的虚拟地址指向第一数据对应的物理地址,从而避免在该硬盘中重复写入相同的数据,以节省硬盘的存储空间。后续的,可以根据该第二数据对应的虚拟地址与第一数据对应的物理地址之间的指向关系读取数据(即第一数据)。As we all know, when writing the second data into a certain hard disk, if the second data is exactly the same as the first data stored in the hard disk, the second data will not be repeatedly written in the hard disk, but will be repeated. The data deletion technology points the virtual address corresponding to the second data to the physical address corresponding to the first data, thereby avoiding repeated writing of the same data in the hard disk and saving the storage space of the hard disk. Subsequently, the data (that is, the first data) may be read according to the pointing relationship between the virtual address corresponding to the second data and the physical address corresponding to the first data.
现有的一种基于重复数据删除技术的数据存储方法是:向硬盘写入数据的过程中,若多个虚拟地址对应的数据相同(例如,均为数据A),则将该多个虚拟地址均指向存储数据A的存储块的物理地址,并在反向映射表中记录该物理地址、该物理地址对应的虚拟地址以及虚拟地址的数量(该虚拟地址的数量称为引用计数)。An existing data storage method based on deduplication technology is: in the process of writing data to the hard disk, if the data corresponding to multiple virtual addresses is the same (for example, all are data A), then the multiple virtual addresses All point to the physical address of the storage block storing data A, and record the physical address, the virtual address corresponding to the physical address, and the number of virtual addresses in the reverse mapping table (the number of the virtual addresses is called a reference count).
当对数据存储系统的存储数据A的存储块(该存储块的物理地址1)进行垃圾回收时,可以将数据A迁移至物理地址2对应的存储块中,并在反向映射表中查询物理地址1对应的引用计数和物理地址1对应的虚拟地址集合,将该条记录中的物理地址1更新为物理地址2,也就是说在反向映射表中将物理地址1对应的引用计数和虚拟地址集合更新为物理地址2对应的引用计数和虚拟地址集合;然而,上述在反向映射表中查询指向该物理地址1的引用计数和虚拟地址集合的过程中,由于反向映射表包括硬盘的所有存储块的反向映射关系,即反向映射表的内容较多,所以从反向映射表中查询指向该物理地址1的引用计数和虚拟地址集合的耗时较长,进而导致获取该物理地址对应的引用计数和虚拟地址集合的效率较低。When garbage collection is performed on the storage block (physical address 1 of the storage block) storing data A in the data storage system, data A can be migrated to the storage block corresponding to physical address 2, and the physical address can be queried in the reverse mapping table. The reference count corresponding to address 1 and the virtual address set corresponding to physical address 1, update the physical address 1 in this record to physical address 2, that is to say, the reference count corresponding to physical address 1 and the virtual address set in the reverse mapping table The address set is updated to the reference count and virtual address set corresponding to physical address 2; however, in the above process of querying the reference count and virtual address set pointing to the physical address 1 in the reverse mapping table, since the reverse mapping table includes hard disk The reverse mapping relationship of all storage blocks, that is, the content of the reverse mapping table is large, so it takes a long time to query the reference count and virtual address set pointing to the physical address 1 from the reverse mapping table, which leads to obtaining the physical address 1 Reference counting of addresses and collection of virtual addresses are less efficient.
发明内容Contents of the invention
本申请实施例提供一种数据存储方法及装置,能够高效地获取存储块对应的引用计数和虚拟地址集合的。Embodiments of the present application provide a data storage method and device capable of efficiently obtaining reference counts and virtual address sets corresponding to storage blocks.
为达到上述目的,本申请实施例采用如下技术方案:In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
第一方面,本申请实施例提供一种数据存储方法,该方法包括:确定第一数据对应的第一物理地址,该第一物理地址用于指示第一存储空间中的至少一个存储块;将上述第一数据写入第一物理地址;在正向映射表中记录上述第一虚拟地址到上述第一物理地址的映射,其中,第一数据的虚拟地址为第一虚拟地址,上述第一物理地址包括第一存储空间的标识和第一物理地址在第一存储空间中的偏移地址;然后,在第一存储空间对应的元信息表中记录第一物理地址,并且在该元信息表中记录第一物理地址对应的第一虚拟地址。In a first aspect, an embodiment of the present application provides a data storage method, the method including: determining a first physical address corresponding to the first data, where the first physical address is used to indicate at least one storage block in the first storage space; The above-mentioned first data is written into the first physical address; the mapping from the above-mentioned first virtual address to the above-mentioned first physical address is recorded in the forward mapping table, wherein the virtual address of the first data is the first virtual address, and the above-mentioned first physical address The address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space; then, record the first physical address in the meta-information table corresponding to the first storage space, and record the first physical address in the meta-information table A first virtual address corresponding to the first physical address is recorded.
本申请实施例提供的数据存储方法中,将第一数据写入用于指示第一存储空间中 的至少一个存储块的第一物理地址,并在第一存储空间对应的元信息表中记录第一物理地址对应的虚拟地址(例如第一虚拟地址),其中,第一物理地址包括第一存储空间的标识和第一物理地址在第一存储空间中的偏移地址;所以当需要获取第一物理地址对应的所有虚拟地址时,根据第一物理地址中的第一存储空间的标识获取第一存储空间对应的元信息表,再在该元信息表中根据第一物理地址查询与该第一物理地址对应的虚拟地址,并统计虚拟地址的数量。由于第一存储空间对应的元信息表中仅存储第一存储空间中的存储块的物理地址对应的虚拟地址,所以仅需在与第一存储空间的标识信息对应的元信息表中遍历与第一存储空间中的物理地址相关的记录,无需遍历整个硬盘中的物理地址相关的记录,因此,查询第一存储空间对应的元信息表耗时较短,能够提高获取存储块对应的引用计数和虚拟地址集合的效率。In the data storage method provided by the embodiment of the present application, the first data is written into the first physical address indicating at least one storage block in the first storage space, and the first data is recorded in the meta information table corresponding to the first storage space. A virtual address (such as a first virtual address) corresponding to a physical address, wherein the first physical address includes an identifier of the first storage space and an offset address of the first physical address in the first storage space; so when it is necessary to obtain the first For all virtual addresses corresponding to the physical address, obtain the meta-information table corresponding to the first storage space according to the identifier of the first storage space in the first physical address, and then query the first physical address in the meta-information table according to the first physical address The virtual address corresponding to the physical address, and count the number of virtual addresses. Since only the virtual address corresponding to the physical address of the storage block in the first storage space is stored in the meta-information table corresponding to the first storage space, it is only necessary to traverse the meta-information table corresponding to the identification information of the first storage space. The physical address-related records in a storage space do not need to traverse the physical address-related records in the entire hard disk. Therefore, it takes less time to query the meta-information table corresponding to the first storage space, which can improve the reference count and Efficiency of virtual address collection.
一种可能的实现方式中,上述数据存储方法还包括:确定第二数据与上述第一数据是否相同,第二数据为待写入的数据;在该第二数据与上述第一数据相同的情况下,在正向映射表中记录该第二虚拟地址到上述第一物理地址的映射;并且在上述元信息表中的该第一物理地址对应的第二虚拟地址。In a possible implementation manner, the above data storage method further includes: determining whether the second data is the same as the above first data, and the second data is data to be written; if the second data is the same as the above first data Next, record the mapping from the second virtual address to the first physical address in the forward mapping table; and record the second virtual address corresponding to the first physical address in the meta information table.
本申请实施例提供的数据存储方法中,当数据存储装置向硬盘写入数据时,首先确定该数据是否为已写入的数据,若该数据为已写入的数据,则将该数据对应的虚拟地址与该数据对应的物理地址建立映射关系,并不在该硬盘上重复写入该数据,这样可以节省硬盘的存储空间。In the data storage method provided by the embodiment of the present application, when the data storage device writes data to the hard disk, it first determines whether the data is already written data, and if the data is already written data, then the A mapping relationship is established between the virtual address and the physical address corresponding to the data, and the data is not repeatedly written on the hard disk, thus saving the storage space of the hard disk.
一种可能的实现方式中,上述元信息表中包含第一物理地址对应的虚拟地址集合;该虚拟地址集合中添加有第一虚拟地址和第二虚拟地址。In a possible implementation manner, the meta information table includes a virtual address set corresponding to the first physical address; the first virtual address and the second virtual address are added to the virtual address set.
一种可能的实现方式中,上述数据存储方法还包括:当对上述第一存储空间进行垃圾回收时,将第一物理地址中的第一数据写入第二物理地址;在第二物理地址所在的第二存储空间对应的元信息表中记录该第二物理地址,并且在第二物理地址对应的虚拟地址集合中添加上述第一虚拟地址和第二虚拟地址;在正向映射表中记录该第一虚拟地址到第二物理地址的映射以及上述第二虚拟地址到该第二物理地址的映射;然后,删除上述第一物理地址中的第一数据;并且删除该第一存储空间的元信息表中的第一物理地址和该第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址;以及删除正向映射表中的第一虚拟地址到第一物理地址的映射以及第二虚拟地址到第一物理地址的映射。In a possible implementation, the data storage method further includes: when performing garbage collection on the first storage space, writing the first data in the first physical address into the second physical address; Record the second physical address in the meta information table corresponding to the second storage space of the second physical address, and add the above-mentioned first virtual address and second virtual address to the virtual address set corresponding to the second physical address; record the second virtual address in the forward mapping table A mapping from the first virtual address to a second physical address and a mapping from the second virtual address to the second physical address; then, deleting the first data in the first physical address; and deleting the meta information of the first storage space The first physical address in the table and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address; and delete the mapping from the first virtual address to the first physical address in the forward mapping table and A mapping of the second virtual address to the first physical address.
一种可能的实现方式中,本申请实施例中的存储空间(包括上述第一存储空间和第二存储空间)可以是持久化日志(Plog)存储空间,该持久化日志存储空间支持以追加写的方式写入数据。In a possible implementation manner, the storage space in the embodiment of the present application (including the first storage space and the second storage space) may be a persistent log (Plog) storage space, and the persistent log storage space supports additional writing way to write data.
第二方面,本申请实施例提供一种数据存储装置,包括:确定模块、写数据模块和信息记录模块。其中,确定模块用于确定第一数据对应的第一物理地址,第一物理地址用于指示第一存储空间中的至少一个存储块;写数据模块用于将第一数据写入第一物理地址;信息记录模块用于在正向映射表中记录第一虚拟地址到第一物理地址的映射,其中,第一数据的虚拟地址为第一虚拟地址,第一物理地址包括第一存储空间的标识和第一物理地址在第一存储空间中的偏移地址;信息记录模块还用于在第一存储空间对应的元信息表中记录第一物理地址,并且在元信息表中记录第一物理地址对 应的第一虚拟地址。In a second aspect, the embodiment of the present application provides a data storage device, including: a determination module, a data writing module, and an information recording module. Wherein, the determination module is used to determine the first physical address corresponding to the first data, and the first physical address is used to indicate at least one storage block in the first storage space; the write data module is used to write the first data into the first physical address ; The information recording module is used to record the mapping from the first virtual address to the first physical address in the forward mapping table, wherein the virtual address of the first data is the first virtual address, and the first physical address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space; the information recording module is also used to record the first physical address in the meta-information table corresponding to the first storage space, and record the first physical address in the meta-information table The corresponding first virtual address.
一种可能的实现方式中,上述确定模块还用于确定第二数据与第一数据是否相同,第二数据为待写入的数据;信息记录模块还用于在第二数据与所述第一数据相同的情况下,在正向映射表中记录第二虚拟地址到所述第一物理地址的映射;信息记录模块还用于在元信息表中记录第一物理地址对应的所述第二虚拟地址。In a possible implementation manner, the above determination module is also used to determine whether the second data is the same as the first data, and the second data is data to be written; the information recording module is also used to compare the second data with the first In the case of the same data, record the mapping from the second virtual address to the first physical address in the forward mapping table; the information recording module is also used to record the second virtual address corresponding to the first physical address in the meta information table. address.
一种可能的实现方式中,上述元信息表中包含第一物理地址对应的虚拟地址集合;该虚拟地址集合中添加有第一虚拟地址和第二虚拟地址。In a possible implementation manner, the meta information table includes a virtual address set corresponding to the first physical address; the first virtual address and the second virtual address are added to the virtual address set.
一种可能的实现方式中,上述数据存储装置还包括:删除模块。上述写数据模块还用于当对第一存储空间进行垃圾回收时,将第一物理地址中的第一数据写入第二物理地址;上述信息记录模块还用于在第二物理地址所在的第二存储空间对应的元信息表中记录第二物理地址,并且在第二物理地址对应的虚拟地址集合中添加第一虚拟地址和第二虚拟地址;该信息记录模块还用于在正向映射表中记录第一虚拟地址到第二物理地址的映射以及第二虚拟地址与第二物理地址的映射;删除模块用于删除第一物理地址中的第一数据,并且删除第一存储空间的元信息表中的第一物理地址和第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址;以及删除正向映射表中的第一虚拟地址到第一物理地址的映射以及第二虚拟地址到第一物理地址的映射。In a possible implementation manner, the data storage device further includes: a deletion module. The above-mentioned writing data module is also used for writing the first data in the first physical address into the second physical address when performing garbage collection on the first storage space; Record the second physical address in the meta information table corresponding to the second storage space, and add the first virtual address and the second virtual address in the virtual address set corresponding to the second physical address; the information recording module is also used for forward mapping table Record the mapping from the first virtual address to the second physical address and the mapping from the second virtual address to the second physical address; the deletion module is used to delete the first data in the first physical address, and delete the meta information of the first storage space The first physical address in the table and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address; and delete the mapping from the first virtual address to the first physical address and the first virtual address in the forward mapping table Mapping of the second virtual address to the first physical address.
第三方面,本申请实施例提供一种存储设备,其中,存储器与处理器耦合;存储器用于存储计算机程序代码,其中,计算机程序代码包括计算机指令;当计算机指令被处理器执行时,使得存储设备执行第一方面及其可能的实现方式中任意之一所述的方法。In a third aspect, the embodiment of the present application provides a storage device, wherein the memory is coupled to the processor; the memory is used to store computer program codes, wherein the computer program codes include computer instructions; when the computer instructions are executed by the processor, the storage The device executes the method described in any one of the first aspect and possible implementations thereof.
第四方面,本申请实施例提供一种计算机存储介质,包括计算机指令,当计算机指令在计算设备上运行时,使得计算设备执行上述第一方面及其可能的实现方式中任意之一所述的方法。In a fourth aspect, an embodiment of the present application provides a computer storage medium, including computer instructions. When the computer instructions are run on the computing device, the computing device is made to execute the above-mentioned method described in any one of the first aspect and its possible implementations. method.
第五方面,本申请实施例提供一种的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面及其可能的实现方式中任意之一所述的方法。In the fifth aspect, the embodiments of the present application provide a computer program product, which, when run on a computer, causes the computer to execute the method described in any one of the above first aspect and possible implementations thereof.
附图说明Description of drawings
图1为本申请实施例提供的一种虚拟地址与物理地址关系示意图一;FIG. 1 is a first schematic diagram of the relationship between a virtual address and a physical address provided by an embodiment of the present application;
图2为本申请实施例提供的一种存储系统示意图一;FIG. 2 is a first schematic diagram of a storage system provided by an embodiment of the present application;
图3为本申请实施例提供的一种数据存储方法示意图一;FIG. 3 is a schematic diagram 1 of a data storage method provided by an embodiment of the present application;
图4为本申请实施例提供的一种硬盘结构示意图一;FIG. 4 is a schematic diagram of a hard disk structure provided by an embodiment of the present application;
图5a为本申请实施例提供的一种写入数据示意图一;Fig. 5a is a schematic diagram 1 of writing data provided by the embodiment of the present application;
图5b为本申请实施例提供的一种写入数据示意图二;Figure 5b is a second schematic diagram of writing data provided by the embodiment of the present application;
图5c为本申请实施例提供的一种垃圾回收示意图一;Fig. 5c is a schematic diagram 1 of garbage collection provided by the embodiment of the present application;
图5d为本申请实施例提供的一种数据删除示意图一;Fig. 5d is a schematic diagram 1 of data deletion provided by the embodiment of the present application;
图6为本申请实施例提供的一种数据存储方法示意图二;FIG. 6 is a second schematic diagram of a data storage method provided by an embodiment of the present application;
图7为本申请实施例提供的一种数据存储方法示意图三;FIG. 7 is a third schematic diagram of a data storage method provided by the embodiment of the present application;
图8为本申请实施例提供的一种数据存储方法示意图四;Fig. 8 is a schematic diagram 4 of a data storage method provided by the embodiment of the present application;
图9为本申请实施例提供的一种数据存储装置示意图一;FIG. 9 is a first schematic diagram of a data storage device provided by an embodiment of the present application;
图10为本申请实施例提供的一种数据存储装置示意图二。FIG. 10 is a second schematic diagram of a data storage device provided by an embodiment of the present application.
具体实施方式Detailed ways
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.
本申请实施例的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一物理地址和第二物理地址等是用于区别不同的物理地址,而不是用于描述物理地址的特定顺序。The terms "first" and "second" in the description and claims of the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first physical address and the second physical address are used to distinguish different physical addresses, rather than describing a specific sequence of physical addresses.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个处理单元是指两个或两个以上的处理单元;多个系统是指两个或两个以上的系统。In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more. For example, multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.
随着互联网技术的发展,各大企业对信息数据存储的需求也在飞速增长,重复数据删除技术作为一种常规数据存储方式,深受各大企业的青睐。With the development of Internet technology, the demand for information and data storage of major enterprises is also increasing rapidly. Data deduplication technology, as a conventional data storage method, is favored by major enterprises.
现有的基于重复数据删除技术的数据存储方法是:当向硬盘中写入数据(以下将待写入的数据称为目标数据A)时,数据存储系统中的处理器判断该硬盘中是否已存储该目标数据A,在该硬盘中不存在目标数据A的情况下,将目标数据A存储在该物理地址1对应的存储块中,并将该目标数据对应虚拟地址1指向该物理地址1,同时,在正向映射表中记录虚拟地址1与物理地址1的映射关系,具体如下表1的第一行所示,其中,正向映射表至少包括字段:虚拟地址和物理地址,且“虚拟地址”字段用于存储具体数据对应的虚拟地址,“物理地址”字段用于存储该虚拟地址指向的物理地址;并在反向映射表中记录指向该物理地址1的虚拟地址的数量(简称为引用计数)以及指向该物理地址1的虚拟地址集合,该虚拟地址集合用于记录指向物理地址的1的所有虚拟地址,具体如下表2所示,其中,反向映射表中至少包括字段:物理地址、引用计数、虚拟地址集合,“物理地址”字段用于存储数据对应的物理地址,“引用计数”字段用于存储指向该物理地址对应的虚拟地址的数量,“虚拟地址”字段用于存储指向该物理地址的所有虚拟地址。The existing data storage method based on de-duplication technology is: when writing data in the hard disk (hereinafter the data to be written is referred to as target data A), the processor in the data storage system judges whether the hard disk has Storing the target data A, if there is no target data A in the hard disk, storing the target data A in the storage block corresponding to the physical address 1, and pointing the virtual address 1 corresponding to the target data to the physical address 1, At the same time, the mapping relationship between virtual address 1 and physical address 1 is recorded in the forward mapping table, specifically as shown in the first row of Table 1 below, wherein the forward mapping table includes at least fields: virtual address and physical address, and "virtual address The "address" field is used to store the virtual address corresponding to the specific data, and the "physical address" field is used to store the physical address pointed to by the virtual address; and the number of virtual addresses pointing to the physical address 1 is recorded in the reverse mapping table (abbreviated as reference count) and a virtual address set pointing to the physical address 1, the virtual address set is used to record all virtual addresses pointing to the physical address 1, specifically as shown in the following table 2, wherein the reverse mapping table includes at least the field: physical Addresses, reference counts, and virtual address collections, the "physical address" field is used to store the physical address corresponding to the data, the "reference count" field is used to store the number of virtual addresses corresponding to the physical address, and the "virtual address" field is used to store All virtual addresses pointing to that physical address.
表1Table 1
Figure PCTCN2022078858-appb-000001
Figure PCTCN2022078858-appb-000001
表2Table 2
Figure PCTCN2022078858-appb-000002
Figure PCTCN2022078858-appb-000002
在该硬盘中已存储目标数据A的情况下,如图1所示将目标数据A对应的虚拟地址1指向该目标数据A对应的物理地址1(即:目标数据A已存储在该存储空见中的 存储块A中),无需在硬盘中再次存储目标数据A,同时在正向映射表中记录虚拟地址1与物理地址1的映射关系,如表3所示,并在反向映射表中更新指向该物理地址的虚拟地址以及该虚拟地址的引用计数,如表4所示。In the case of stored target data A in the hard disk, as shown in FIG. In the storage block A in), there is no need to store the target data A again in the hard disk, and at the same time record the mapping relationship between virtual address 1 and physical address 1 in the forward mapping table, as shown in Table 3, and in the reverse mapping table Update the virtual address pointing to the physical address and the reference count of the virtual address, as shown in Table 4.
表3table 3
虚拟地址virtual address 物理地址physical address
虚拟地址1 virtual address 1 物理地址1 physical address 1
虚拟地址2virtual address 2 物理地址1 physical address 1
虚拟地址3virtual address 3 物理地址1 physical address 1
表4Table 4
Figure PCTCN2022078858-appb-000003
Figure PCTCN2022078858-appb-000003
当对数据存储系统的硬盘进行垃圾回收时,将目标数据A从物理地址1迁移至物理地址2;同时,在反向映射表中查询指向该物理地址1的引用计数和虚拟地址集合,并将物理地址1对应的引用计数和虚拟地址集合更新至物理地址2对应的引用计数和虚拟地址集合,然后,将反向映射表中物理地址1对应的虚拟地址集合中的虚拟地址重定向至物理地址2,最后,在正向映射表中将虚拟地址与物理地址1的映射关系更新为该虚拟地址与物理地址2的映射关系。When garbage collection is performed on the hard disk of the data storage system, the target data A is migrated from physical address 1 to physical address 2; at the same time, the reference count and virtual address set pointing to the physical address 1 are queried in the reverse mapping table, and The reference count and virtual address set corresponding to physical address 1 are updated to the reference count and virtual address set corresponding to physical address 2, and then, the virtual address in the virtual address set corresponding to physical address 1 in the reverse mapping table is redirected to the physical address 2. Finally, update the mapping relationship between the virtual address and physical address 1 to the mapping relationship between the virtual address and physical address 2 in the forward mapping table.
然而,上述在反向映射表中查询指向该物理地址1的引用计数和虚拟地址集合的过程中,由于反向映射表中存储的数据量比较大,所以从反向映射表中查询指向该物理地址1的引用计数和虚拟地址集合的耗时较长,进而导致获取该物理地址对应的引用计数和虚拟地址集合的效率较低。However, in the above-mentioned process of querying the reference count and virtual address set pointing to the physical address 1 in the reverse mapping table, since the amount of data stored in the reverse mapping table is relatively large, the query pointing to the physical address 1 from the reverse mapping table The reference count and virtual address set of address 1 take a long time, which leads to low efficiency in obtaining the reference count and virtual address set corresponding to the physical address.
基于现有技术存在的获取物理地址对应的引用计数和虚拟地址集合的效率较低的问题,本申请实施例提供一种数据存储方法及装置,具体实现为:确定第一数据对应的第一物理地址,第一物理地址用于指示第一存储空间中的至少一个存储块;将第一数据写入第一物理地址;在正向映射表中记录第一虚拟地址到第一物理地址的映射,第一数据的虚拟地址为第一虚拟地址,第一物理地址包括第一存储空间的标识和第一物理地址在第一存储空间中的偏移地址;在第一存储空间对应的元信息表中记录第一物理地址,并且在元信息表中记录第一物理地址对应的第一虚拟地址。Based on the problem of low efficiency in obtaining the reference count and virtual address set corresponding to the physical address in the prior art, the embodiment of the present application provides a data storage method and device, specifically implemented as: determining the first physical address corresponding to the first data address, the first physical address is used to indicate at least one storage block in the first storage space; writing the first data into the first physical address; recording the mapping from the first virtual address to the first physical address in the forward mapping table, The virtual address of the first data is the first virtual address, and the first physical address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space; in the meta information table corresponding to the first storage space The first physical address is recorded, and the first virtual address corresponding to the first physical address is recorded in the meta information table.
通过本申请实施例提供的技术方案,能高效地获取存储块对应的引用计数和虚拟地址集合的。Through the technical solution provided by the embodiment of the present application, the reference count and virtual address set corresponding to the storage block can be obtained efficiently.
本申请实施例提供的数据存储方法及装置可以应用于图2所示的存储系统,该存储系统够可以是固态硬盘的存储系统,如图2所示,数据存储装置包括主控制器(简称:主控)201和多个闪存芯片205,其中,主控201包括:处理器202、主机接口204、和n(n>0)个通道控制器203。The data storage method and device provided by the embodiments of the present application can be applied to the storage system shown in FIG. 2, and the storage system can be a storage system of a solid-state hard disk. As shown in FIG. 2, the data storage device includes a main controller (referred to as: main control) 201 and a plurality of flash memory chips 205, wherein the main control 201 includes: a processor 202, a host interface 204, and n (n>0) channel controllers 203.
上述主控201用于向多个闪存芯片205发布可执行命令,从而实现在闪存芯片205上读取或写入数据的过程。The above-mentioned master control 201 is used to issue executable commands to multiple flash memory chips 205 , so as to realize the process of reading or writing data on the flash memory chips 205 .
上述主机接口204用于与主机通信,进而接收主机发送的命令请求,并将该命令请求转发至处理器202,其中,上述主机不限于服务器、个人电脑或者阵列控制器等 任何设备The above-mentioned host interface 204 is used to communicate with the host, and then receive the command request sent by the host, and forward the command request to the processor 202, wherein the above-mentioned host is not limited to any device such as a server, a personal computer, or an array controller
上述处理器202根据主机接口204转发的命令请求,向上述多个闪存颗粒205发送可执行命令,处理器202包括一个或多个CPU。该CPU可以为单核CPU(single-CPU)或多核CPU(multi-CPU)。The processor 202 sends executable commands to the plurality of flash memory particles 205 according to the command request forwarded by the host interface 204, and the processor 202 includes one or more CPUs. The CPU may be a single-core CPU (single-CPU) or a multi-core CPU (multi-CPU).
上述通道控制器203用于承载处理器202向上述多个芯片205发布的可执行命令。The channel controller 203 is used to carry the executable commands issued by the processor 202 to the plurality of chips 205 .
可选地,存储装置还包括总线206,上述处理器202、通道控制器203、主机接口204以及闪存芯片205通常通过总线206相互连接,或采用其他方式相互连接。Optionally, the storage device further includes a bus 206, and the processor 202, the channel controller 203, the host interface 204, and the flash memory chip 205 are generally connected to each other through the bus 206, or are connected to each other in other ways.
上述存储系统接收到业务层的数据写入请求时,主控201中的主机接口204将该数据写入请求转发至主控201中的处理器202,处理器202根据该数据写入请求并通过通道控制器203向闪存芯片205下发数据写入指令,已实现向闪存芯片205中写入数据。When the above-mentioned storage system receives the data writing request of the service layer, the host interface 204 in the main control 201 forwards the data writing request to the processor 202 in the main control 201, and the processor 202 writes the request according to the data and passes The channel controller 203 issues a data writing instruction to the flash memory chip 205 , and the writing of data into the flash memory chip 205 has been implemented.
可选的,执行本申请实施例提供的数据存储方法的装置可以是上述图2所示的存储系统中的主控制器中的处理器,如图3所示,该方法可以包括S310-S340。Optionally, the device for executing the data storage method provided by the embodiment of the present application may be the processor in the main controller in the storage system shown in FIG. 2 above. As shown in FIG. 3 , the method may include S310-S340.
S310、数据存储装置确定第一数据对应的第一物理地址。S310. The data storage device determines a first physical address corresponding to the first data.
应理解,上述第一数据是硬盘中尚未存储的数据。It should be understood that the above-mentioned first data is data not yet stored in the hard disk.
在本申请实施例中,上述第一物理地址用于指示第一存储空间中的至少一个存储块,其中,第一存储空间是将硬盘划分为粒度较大的区块,如图4所示,将一个硬盘划分为3个存储空间,每一行为一个存储空间,每个存储空间中包括N个存储块,其中,N大于或等于2,且小于硬盘中存储块的总量。In the embodiment of the present application, the above-mentioned first physical address is used to indicate at least one storage block in the first storage space, wherein the first storage space is to divide the hard disk into blocks with larger granularity, as shown in FIG. 4 , A hard disk is divided into 3 storage spaces, each row is a storage space, and each storage space includes N storage blocks, where N is greater than or equal to 2 and less than the total number of storage blocks in the hard disk.
应理解,上述图2中的闪存芯片205有多个存储空间构成,或者,上述多个闪存芯片205构成一个存储空间,在本申请中对存储空间的大小划分方式不仅进行限定。It should be understood that the above-mentioned flash memory chip 205 in FIG. 2 is composed of multiple storage spaces, or the above-mentioned multiple flash memory chips 205 constitute one storage space, and the size division method of the storage space is not limited in this application.
可选的,存储空间可以是持久化日志存储空间(persistent log,Plog),其中,该Plog支持以追加写的方式写入数据。Optionally, the storage space may be a persistent log storage space (persistent log, Plog), where the Plog supports writing data in an append writing manner.
示例性的,以日志型存储方式为例,将硬盘划分多个Plog,上述每一个存储空间相当于一个Plog。Exemplarily, taking the log type storage mode as an example, the hard disk is divided into multiple Plogs, and each of the above storage spaces is equivalent to one Plog.
需要说明的是,本申请实施例后续描述中均以一个存储空间为一个Plog为例进行说明。It should be noted that, in the subsequent descriptions of the embodiments of the present application, one storage space is taken as one Plog as an example for illustration.
上述第一物理地址可以是数据存储装置根据第一数据计算得到的,也可以是由存储器随机分配的,具体本申请实施例不进行限定。The above-mentioned first physical address may be calculated by the data storage device according to the first data, or may be randomly allocated by the memory, which is not limited in this embodiment of the present application.
S320、数据存储装置将第一数据写入第一物理地址。S320. The data storage device writes the first data into the first physical address.
应理解,上述第一数据可以存储在第一存储空间中的一个存储块上的,也可以存储在第一存储空间中的多个存储块上,具体根据第一数据的大小以及存储块的可存储量决定。It should be understood that the above-mentioned first data may be stored on one storage block in the first storage space, or may be stored on multiple storage blocks in the first storage space, depending on the size of the first data and availability of the storage blocks. The amount of storage is determined.
在本申请实施例中,上述第一物理地址包括第一存储空间的标识和第一物理地址在第一存储空间中的偏移地址(即:用于存储第一数据的存储块在第一存储空间中对应的偏移地址)。示例性的,基于上述示例,以第一存储空间为上述一个Plog为例,第一物理地址为Plog id+offset,其中,Plog id为Plog在硬盘上对应的id,offset为该Plog中用于存储第一数据的存储块的偏移地址,如该Plog个中第100个存储块用于存储第一数据,此时,该offset则为100。In this embodiment of the present application, the above-mentioned first physical address includes the identifier of the first storage space and the offset address of the first physical address in the first storage space (that is, the storage block used to store the first data is in the first storage space The corresponding offset address in the space). Exemplary, based on the above example, taking the first storage space as the above-mentioned Plog as an example, the first physical address is Plog id+offset, wherein, Plog id is the id corresponding to the Plog on the hard disk, and offset is used in the Plog The offset address of the storage block storing the first data. For example, the 100th storage block in the Plog is used to store the first data. At this time, the offset is 100.
上述将第一数据写入第一物理地址指示的第一存储空间中的至少一个存储块,具体可以为:根据第一物理地址中的第一存储空间的唯一标识(即Plog id)确定出第一存储空间,然后,根据第一物理地址中存储块在第一存储空间中对应的偏移地址,确定出该存储第一数据的存储块;最后,将第一数据写入该存储块中。The above-mentioned writing of the first data into at least one storage block in the first storage space indicated by the first physical address may specifically be: determine the first storage block according to the unique identifier (i.e. Plog id) of the first storage space in the first physical address A storage space, then, according to the offset address corresponding to the storage block in the first storage space in the first physical address, determine the storage block storing the first data; finally, write the first data into the storage block.
示例性的,基于上述示例,假设第一物理地址中包含的第一存储空间的标识和存储在第一存储空间中对应的偏移地址为Plog3_100时,数据存储装置根据第一物理地址找到Plog id为3的Plog,然后,在该Plog中从左到右找到第100个存储块,最后,将第一数据写入该Plog中的第100个存储块中。Exemplarily, based on the above example, assuming that the identifier of the first storage space contained in the first physical address and the corresponding offset address stored in the first storage space are Plog3_100, the data storage device finds the Plog id according to the first physical address The Plog is 3, and then the 100th storage block is found from left to right in the Plog, and finally, the first data is written into the 100th storage block in the Plog.
可选的,上述第一物理地址也可以是硬盘上一个由多个16进制字符组成的具体地址,该地址对应的存储位置是第一存储空间中的一个或多个存储块Optionally, the above-mentioned first physical address may also be a specific address on the hard disk consisting of multiple hexadecimal characters, and the storage location corresponding to the address is one or more storage blocks in the first storage space
S330、数据存储装置在正向映射表中记录第一虚拟地址到第一物理地址的映射。S330. The data storage device records the mapping from the first virtual address to the first physical address in the forward mapping table.
上述第一虚拟地址为第一数据对应的虚拟地址,上述正向映射表用于记录虚拟地址与物理地址的映射关系,其中,该虚拟地址指向该物理地址,关于正向映射表的记录可以参考上述表1的内容。The above-mentioned first virtual address is the virtual address corresponding to the first data, and the above-mentioned forward mapping table is used to record the mapping relationship between the virtual address and the physical address, wherein the virtual address points to the physical address, and the records of the forward mapping table can refer to Contents of Table 1 above.
可选的,在正向映射表中,一个物理地址与至少一个虚拟地址具有映射关系,也就是说在正向映射表中,物理地址与虚拟地址的可以是一对多的关系,也可以是一对一的关系。Optionally, in the forward mapping table, a physical address has a mapping relationship with at least one virtual address, that is to say, in the forward mapping table, the relationship between the physical address and the virtual address can be one-to-many, or it can be One-to-one relationship.
需要说明的是,当数据存储装置读取某个虚拟地址对应的数据时,通过在正向映射表中查询该虚拟地址对应的物理地址,然后,从该物理地址对应的存储块中读取该数据。It should be noted that when the data storage device reads data corresponding to a certain virtual address, the physical address corresponding to the virtual address is searched in the forward mapping table, and then the data is read from the storage block corresponding to the physical address. data.
S340、数据存储装置在第一存储空间对应的元信息表中记录第一物理地址,并且在元信息表中记录第一物理地址对应的第一虚拟地址。S340. The data storage device records the first physical address in the meta information table corresponding to the first storage space, and records the first virtual address corresponding to the first physical address in the meta information table.
应理解,上述元信息表与第一存储空间是一一对应的,即一个存储空间对应一个元信息表,且该元信息表的标识就是该元信息表对应的存储空间的标识,也就是说根据存储空间的标识就可以获取该存储空间对应的元信息表;上述元信息表用于记录物理地址与指向该物理地址的虚拟地址集合,其中,物理地址与指向该物理地址的虚拟地址集合可以是以键值对(key-value)的形式进行存储的,在该元信息表中key用于记录物理地址,key对应的value用于记录指向该物理地址的所有虚拟地址。It should be understood that the above-mentioned meta information table is in one-to-one correspondence with the first storage space, that is, one storage space corresponds to one meta information table, and the identifier of the meta information table is the identifier of the storage space corresponding to the meta information table, that is to say According to the identification of the storage space, the meta information table corresponding to the storage space can be obtained; the above meta information table is used to record the physical address and the virtual address set pointing to the physical address, wherein the physical address and the virtual address set pointing to the physical address can be It is stored in the form of a key-value pair (key-value). In the meta information table, the key is used to record the physical address, and the value corresponding to the key is used to record all virtual addresses pointing to the physical address.
如图5a所示,虚拟地址1指向物理地址1,虚拟地址10指向物理地址10,其中,物理地址1与物理地址10都属于第一存储空间,进而,第一存储空间对应的元信息表如下表5所示,物理地址1对应虚拟地址1,物理地址10对应虚拟地址10。As shown in Figure 5a, virtual address 1 points to physical address 1, and virtual address 10 points to physical address 10, wherein both physical address 1 and physical address 10 belong to the first storage space, and further, the meta information table corresponding to the first storage space is as follows As shown in Table 5, physical address 1 corresponds to virtual address 1, and physical address 10 corresponds to virtual address 10.
表5table 5
Figure PCTCN2022078858-appb-000004
Figure PCTCN2022078858-appb-000004
需要说明的是,本申请对上述S330和S340的执行顺序不进行限定,即数据存储装置可以先执行S330后再执行S340,数据存储装置也可以先执行S340后再执行S330。It should be noted that the present application does not limit the execution sequence of S330 and S340 above, that is, the data storage device may execute S330 first and then S340, and the data storage device may also execute S340 first and then S330.
本申请实施例提供的数据存储方法中,将第一数据存储在第一物理地址指示的第 一存储空间中的至少一个存储块,并在第一存储空间对应的元信息表中记录第一物理地址和该第一物理地址对应的虚拟地址,其中,第一物理地址包括第一存储空间的标识和存储第一数据的存储块在第一存储空间中的偏移地址;所以当需要获取第一物理地址对应的所有虚拟地址时,根据第一物理地址中的第一存储空间的标识获取第一存储空间对应的元信息表,再在该元信息表中根据第一物理地址查询指向第一物理地址的虚拟地址集合,并对该虚拟地址集合中的虚拟地址的数量进行计数得到指向第一物理地址的虚拟地址的数量。由于第一存储空间对应的元信息表中只存储第一存储空间中的存储块的物理地址与该物理地址对应的虚拟地址的映射关系,所以仅需在与第一存储空间的标识信息对应的元信息表中遍历与第一存储空间中的物理地址相关的记录,无需遍历整个硬盘中的物理地址相关的记录,因此,查询第一存储空间对应的元信息表耗时较短,能够提高获取存储块对应的引用计数和虚拟地址集合的效率。In the data storage method provided by the embodiment of the present application, the first data is stored in at least one storage block in the first storage space indicated by the first physical address, and the first physical address is recorded in the meta information table corresponding to the first storage space. address and the virtual address corresponding to the first physical address, wherein the first physical address includes the identifier of the first storage space and the offset address of the storage block storing the first data in the first storage space; so when it is necessary to obtain the first For all virtual addresses corresponding to the physical address, obtain the meta information table corresponding to the first storage space according to the identifier of the first storage space in the first physical address, and then query the first physical address in the meta information table according to the first physical address address, and count the number of virtual addresses in the virtual address set to obtain the number of virtual addresses pointing to the first physical address. Since only the mapping relationship between the physical address of the storage block in the first storage space and the virtual address corresponding to the physical address is stored in the meta information table corresponding to the first storage space, only the identification information corresponding to the first storage space needs to be Traversing the records related to the physical address in the first storage space in the meta information table does not need to traverse the records related to the physical address in the entire hard disk. Therefore, it takes less time to query the meta information table corresponding to the first storage space, which can improve the acquisition time. The efficiency of the reference count and virtual address set corresponding to the storage block.
可选的,将第一数据成功写入硬盘之后,数据存储装置在该硬盘写入第二数据时,结合图3,如图6所示,本申请实施例提供的数据存储方法还包括:S610-S630。Optionally, after the first data is successfully written into the hard disk, when the data storage device writes the second data into the hard disk, as shown in FIG. 6 in conjunction with FIG. 3 , the data storage method provided by the embodiment of the present application further includes: S610 -S630.
S610、数据存储装置确定第二数据与第一数据是否相同。S610. The data storage device determines whether the second data is the same as the first data.
上述确定第二数据与第一数据是否相同的方法可以为:数据存储装置根据第二数据计算第二数据对应的指纹信息(例如:第二数据对应的哈希值),然后,将第二数据对应的指纹信息与第一数据对应的指纹信息进行对比,若第二数据对应的指纹信息与第一数据对应的指纹信息完全相同,则第二数据与第一数据相同;若第二数据对应的指纹信息与第一数据对应的指纹信息不完全相同,则第二数据与第一数据不相同。The above-mentioned method for determining whether the second data is the same as the first data may be as follows: the data storage device calculates the fingerprint information corresponding to the second data (for example: the hash value corresponding to the second data) according to the second data, and then converts the second data to The corresponding fingerprint information is compared with the fingerprint information corresponding to the first data, if the fingerprint information corresponding to the second data is completely the same as the fingerprint information corresponding to the first data, then the second data is the same as the first data; If the fingerprint information is not completely the same as the fingerprint information corresponding to the first data, then the second data is different from the first data.
可选的,也可以采用其他方法对上述第二数据与第一数据进行比较,在本申请实施例中,不对上述确定第二数据与第一数据是否相同的方法进行限定。Optionally, other methods may also be used to compare the second data with the first data. In this embodiment of the present application, the method for determining whether the second data is the same as the first data is not limited.
当上述第二数据为该硬盘上未曾写入的数据时,将第二数据写入该硬盘中的存储块,写入第二数据的方法可以参考上述实施例中的S310-S340中描述的写入第一数据的过程;当上述第二数据与上述第一数据相同时,执行S620。When the above-mentioned second data is data that has not been written on the hard disk, write the second data into the storage block in the hard disk, and the method of writing the second data can refer to the writing described in S310-S340 in the above-mentioned embodiment. The process of inputting the first data; when the above-mentioned second data is the same as the above-mentioned first data, execute S620.
S620、数据存储装置在第二数据与第一数据相同的情况下,在正向映射表中记录第二虚拟地址到第一物理地址的映射。S620. If the second data is the same as the first data, the data storage device records the mapping from the second virtual address to the first physical address in the forward mapping table.
应理解,上述第二虚拟地址是第二数据对应的虚拟地址,上述第二虚拟地址指向上述第一物理地址。It should be understood that the second virtual address is a virtual address corresponding to the second data, and the second virtual address points to the first physical address.
上述在正向映射表中记录第二虚拟地址和第一物理地址,如表3所示,具体实现过程参考上述S330的详细描述,此处不再赘述。The above-mentioned recording of the second virtual address and the first physical address in the forward mapping table is shown in Table 3. For the specific implementation process, refer to the detailed description of S330 above, which will not be repeated here.
S630、数据存储装置在元信息表中记录第一物理地址对应的第二虚拟地址。S630. The data storage device records the second virtual address corresponding to the first physical address in the meta information table.
可选的,上述在元信息表中记录第一物理地址对应的第二虚拟地址,可以是将第二虚拟地址添加至元信息表中的第一物理地址对应的虚拟地址集合中。Optionally, the aforementioned recording of the second virtual address corresponding to the first physical address in the meta information table may be adding the second virtual address to the set of virtual addresses corresponding to the first physical address in the meta information table.
如图5b所示,虚拟地址1指向物理地址1,虚拟地址2也指向物理地址1,虚拟地址10指向物理地址10,其中,物理地址1与物理地址10都属于第一存储空间,如此,第一存储空间对应的元信息表可以如下表6所示,物理地址1对应虚拟地址1和虚拟地址2,物理地址10对应虚拟地址10。As shown in Figure 5b, virtual address 1 points to physical address 1, virtual address 2 also points to physical address 1, and virtual address 10 points to physical address 10, wherein both physical address 1 and physical address 10 belong to the first storage space, thus, the first The meta information table corresponding to a storage space may be shown in Table 6 below, where physical address 1 corresponds to virtual address 1 and virtual address 2, and physical address 10 corresponds to virtual address 10.
表6Table 6
Figure PCTCN2022078858-appb-000005
Figure PCTCN2022078858-appb-000005
Figure PCTCN2022078858-appb-000006
Figure PCTCN2022078858-appb-000006
本申请实施例提供的数据存储方法中,在数据存储装置向硬盘写入数据时,确定该数据是否为已写入数据,若该数据为已写入数据,则将该数据对应的虚拟地址与该数据对应的物理地址建立映射关系,并不在该硬盘上重复写入该数据,这样可以节省硬盘的存储空间。In the data storage method provided by the embodiment of the present application, when the data storage device writes data to the hard disk, it is determined whether the data is written data, and if the data is written data, the virtual address corresponding to the data is compared with The physical address corresponding to the data establishes a mapping relationship, and the data is not repeatedly written on the hard disk, so that the storage space of the hard disk can be saved.
可选的,本申请实施例提供的数据存储方法,基于图6,如图7所示,该数据存储方法还包括:S710-S770。Optionally, the data storage method provided in the embodiment of the present application is based on FIG. 6 , and as shown in FIG. 7 , the data storage method further includes: S710-S770.
S710、当对第一存储空间进行垃圾回收时,数据存储装置将第一物理地址中的第一数据写入第二物理地址。S710. When performing garbage collection on the first storage space, the data storage device writes the first data in the first physical address into the second physical address.
上述第二物理地址用指示的第二存储空间中的至少一个存储块,其中,第二物理地址包括第二存储空间的标识和第二物理地址在第二存储空间中的偏移地址;At least one storage block in the second storage space indicated by the above-mentioned second physical address, wherein the second physical address includes an identifier of the second storage space and an offset address of the second physical address in the second storage space;
需要说明的是,上述第一存储空间与第二存储空间为不同的存储空间,也就是说上述第二存储空间是该硬盘上除第一存储空间以外的其他存储空间,具体本申请不进行限定。It should be noted that the above-mentioned first storage space and the second storage space are different storage spaces, that is to say, the above-mentioned second storage space is other storage spaces on the hard disk except the first storage space, which is not limited in this application. .
在本申请实施例中,上述将第一物理地址中的第一数据写入第二物理地址指示的第二存储空间中的至少一个存储块的方法具体为:根据第一物理地址中存储空间的标识信息确定第一物理地址对应的存储空间(即:第一存储空间),然后,在第一存储空间中根据第一物理地址中的偏移地址确定存储第一数据的存储块,并在该存储块上读取第一数据;最后,将读取的第一数据写入第二物理地址指示的第二存储空间中的至少一个存储块。In the embodiment of the present application, the method for writing the first data in the first physical address into at least one storage block in the second storage space indicated by the second physical address is as follows: according to the storage space in the first physical address The identification information determines the storage space corresponding to the first physical address (that is, the first storage space), and then, in the first storage space, determines the storage block storing the first data according to the offset address in the first physical address, and in the first storage space reading the first data from the storage block; finally, writing the read first data into at least one storage block in the second storage space indicated by the second physical address.
上述将第一数据写入第二物理地址的方法与S320中的将第一数据写入第一物理地址的方法类似,具体参考上述的详细描述,此处不再赘述。The above-mentioned method of writing the first data into the second physical address is similar to the method of writing the first data into the first physical address in S320 , for details, refer to the above detailed description, and details will not be repeated here.
S720、数据存储装置在第二物理地址所在的第二存储空间对应元信息表中记录第二物理地址,并且在第二物理地址对应的虚拟地址集合中添加第一虚拟地址和第二虚拟地址。S720. The data storage device records the second physical address in the meta information table corresponding to the second storage space where the second physical address is located, and adds the first virtual address and the second virtual address to a virtual address set corresponding to the second physical address.
在本申请实施例中,根据第一物理地址获取第一存储空间对应的元信息表,然后,在该元信息表中根据第一物理地址获取指向第一物理地址的虚拟地址集合,然后,在第二存储空间的元信息表中记录第二物理地址,并且在第二物理地址对应的虚拟地址集合中添加第一虚拟地址和第二虚拟地址的方法。In the embodiment of the present application, the meta information table corresponding to the first storage space is obtained according to the first physical address, and then the virtual address set pointing to the first physical address is obtained in the meta information table according to the first physical address, and then, in A method for recording the second physical address in the meta information table of the second storage space, and adding the first virtual address and the second virtual address to the virtual address set corresponding to the second physical address.
S730、数据存储装置在正向映射表中记录第一虚拟地址到第二物理地址的映射。S730. The data storage device records the mapping from the first virtual address to the second physical address in the forward mapping table.
上述在正向映射表中记录第一虚拟地址和第二物理地址的方法可以参考上述S330中在正向映射表中记录第一虚拟地址和第一物理地址的方法类似,具体参考上述实施例的详细描述,此处不再赘述。The above method of recording the first virtual address and the second physical address in the forward mapping table can be similar to the method of recording the first virtual address and the first physical address in the forward mapping table in S330 above, and specifically refer to the above-mentioned embodiment Detailed description will not be repeated here.
S740、数据存储装置在正向映射表中记录第二虚拟地址到第二物理地址的映射。S740. The data storage device records the mapping from the second virtual address to the second physical address in the forward mapping table.
上述在正向映射表中记录第二虚拟地址和第二物理地址的方法参考上述S330的详细描述,此处不再赘述。For the above method of recording the second virtual address and the second physical address in the forward mapping table, refer to the detailed description of S330 above, which will not be repeated here.
S750、数据存储装置删除第一物理地址中的第一数据。S750. The data storage device deletes the first data in the first physical address.
S760、数据存储装置删除第一存储空间的元信息表中的第一物理地址和第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址。S760. The data storage device deletes the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address in the meta information table of the first storage space and the first physical address.
应理解,上述删除第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址,即删除第一物理地址对应的虚拟地址集合中的所有虚拟地址。It should be understood that the above deletion of the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address means deleting all virtual addresses in the virtual address set corresponding to the first physical address.
S770、数据存储装置删除正向映射表中的第一虚拟地址到第一物理地址的映射以及第二虚拟地址到第一物理地址的映射。S770. The data storage device deletes the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table.
如图5c所示,对物理地址1对应第一存储空间进行垃圾回收之后,将物理地址1对应的第一存储空间的元信息表中物理地址1对应的虚拟地址集合中的所有虚拟地址写至第二存储空间的元信息表中物理地址2对应的虚拟地址集合,该第二存储空间对应的元信息表如下表7所示,其中,虚拟地址1和虚拟地址2分别与物理地址2建立映射关系。此外,删除第一存储空间对应的元信息表中的物理地址1与物理地址1对应的虚拟地址集合,如表8所示,第一存储空间对应的元信息表中将不存在与物理地址1有关的虚拟地址。As shown in Figure 5c, after performing garbage collection on the first storage space corresponding to physical address 1, write all virtual addresses in the virtual address set corresponding to physical address 1 in the meta information table of the first storage space corresponding to physical address 1 to The virtual address set corresponding to physical address 2 in the meta information table of the second storage space, the meta information table corresponding to the second storage space is shown in Table 7 below, where virtual address 1 and virtual address 2 are respectively mapped to physical address 2 relation. In addition, delete the physical address 1 in the meta-information table corresponding to the first storage space and the virtual address set corresponding to the physical address 1, as shown in Table 8, there will be no physical address 1 in the meta-information table corresponding to the first storage space The associated virtual address.
表7Table 7
Figure PCTCN2022078858-appb-000007
Figure PCTCN2022078858-appb-000007
表8Table 8
Figure PCTCN2022078858-appb-000008
Figure PCTCN2022078858-appb-000008
基于图6,如图8所示,本申请实施例提供的数据存储方法还包括:S810-S830。Based on FIG. 6 , as shown in FIG. 8 , the data storage method provided by the embodiment of the present application further includes: S810-S830.
S810、数据存储装置删除正向映射表中的第一虚拟地址到第一物理地址的映射以及第二虚拟地址到第一物理地址的映射。S810. The data storage device deletes the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table.
S820、数据存储装置删除第一存储空间的元信息表中的第一物理地址和第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址。S820. The data storage device deletes the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address in the meta information table of the first storage space and the first physical address.
上述删除第一存储空间的元信息表中的第一物理地址和第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址的方法,与上述S760中删除第一存储空间的元信息表中的第一物理地址和第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址的方法类似,具体参考上述详细描述,此处不再赘述。The above method of deleting the first physical address in the meta information table of the first storage space and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address is the same as the method of deleting the first storage space in S760 above. The first physical address in the meta-information table is similar to the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address. For details, please refer to the above detailed description, which will not be repeated here.
S830、数据存储装置删除第一物理地址中的第一数据。S830. The data storage device deletes the first data in the first physical address.
如图5d所示,在数据存储装置删除第一物理地址中的第一数据后,删除第一存储空间对应的元信息表中的物理地址1与物理地址1对应的虚拟地址集合,如表8所示,第一存储空间对应的元信息表中没有与物理地址有关的数据。As shown in FIG. 5d, after the data storage device deletes the first data in the first physical address, delete physical address 1 and the virtual address set corresponding to physical address 1 in the meta information table corresponding to the first storage space, as shown in Table 8 As shown, there is no data related to the physical address in the meta information table corresponding to the first storage space.
相应地,本申请实施例提供一种数据存储装置,该数据存储装置用于执行上述数据存储方法中各个的步骤,本申请实施例可以根据上述方法示例对该数据存储装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅 仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。Correspondingly, the embodiment of the present application provides a data storage device, the data storage device is used to execute the steps in the above data storage method, and the embodiment of the present application can divide the functional modules of the data storage device according to the above method example, For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. The division of modules in the embodiment of this application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,图9示出上述实施例中所涉及的数据存储装置的一种可能的结构示意图。如图9所示,该数据存储装置包括:确定模块901、写数据模块902和信息记录模块903。In the case of dividing each functional module corresponding to each function, FIG. 9 shows a possible structural diagram of the data storage device involved in the above embodiment. As shown in FIG. 9 , the data storage device includes: a determination module 901 , a data writing module 902 and an information recording module 903 .
确定模块901用于确定第一数据对应的第一物理地址,例如执行上述方法实施例中的步骤S310。The determining module 901 is configured to determine the first physical address corresponding to the first data, for example, execute step S310 in the above method embodiment.
写数据模块902用于将第一数据写入第一物理地址,例如执行上述方法实施例中的步骤S320。The data writing module 902 is configured to write the first data into the first physical address, for example, execute step S320 in the above method embodiment.
信息记录模块903用于在正向映射表中记录第一虚拟地址到第一物理地址的映射,例如执行上述方法实施例中的步骤S330。The information recording module 903 is configured to record the mapping from the first virtual address to the first physical address in the forward mapping table, for example, execute step S330 in the above method embodiment.
信息记录模块903还用于在第一存储空间对应的元信息表中记录第一物理地址,并且在第一存储空间对应的元信息表中记录第一物理地址对应的第一虚拟地址,例如执行上述方法实施例中的步骤S340。The information recording module 903 is further configured to record the first physical address in the meta information table corresponding to the first storage space, and record the first virtual address corresponding to the first physical address in the meta information table corresponding to the first storage space, for example, execute Step S340 in the above method embodiment.
可选的,上述确定模块901用于确定第二数据与第一数据是否相同,例如执行上述方法实施例中的步骤S610。Optionally, the determination module 901 is configured to determine whether the second data is the same as the first data, for example, execute step S610 in the above method embodiment.
上述信息记录模块903还用于在第二数据与第一数据相同的情况下,在正向映射表中记录第二虚拟地址到第一物理地址的映射,例如执行上述方法实施例中的步骤S620。The above information recording module 903 is also configured to record the mapping from the second virtual address to the first physical address in the forward mapping table when the second data is the same as the first data, for example, perform step S620 in the above method embodiment .
上述信息记录模块903用于在元信息表中记录第一物理地址对应的第二虚拟地址,例如执行上述方法实施例中的步骤S630。The above information recording module 903 is configured to record the second virtual address corresponding to the first physical address in the meta information table, for example, execute step S630 in the above method embodiment.
可选的,本申请实施例提供一种数据存储装置还包括:删除模块904;Optionally, the embodiment of the present application provides a data storage device that further includes: a deletion module 904;
上述写数据模块902还用于当对第一存储空间进行垃圾回收时,将第一物理地址中的第一数据写入第二物理地址,例如执行上述方法实施例中的步骤S710。The data writing module 902 is further configured to write the first data in the first physical address into the second physical address when performing garbage collection on the first storage space, for example, execute step S710 in the above method embodiment.
上述信息记录模块903还用于在第二物理地址所在的第二存储空间对应的元信息表中记录第二物理地址,并且在第二物理地址对应的虚拟地址集合中添加第一虚拟地址和第二虚拟地址,例如执行上述方法实施例中的步骤S720。The above-mentioned information recording module 903 is further configured to record the second physical address in the meta information table corresponding to the second storage space where the second physical address is located, and add the first virtual address and the first virtual address to the virtual address set corresponding to the second physical address. Two virtual addresses, for example, execute step S720 in the above method embodiment.
上述信息记录模块903还用于在正向映射表中记录第一虚拟地址到第二物理地址的映射以及第二虚拟地址到第二物理地址的映射,例如执行上述方法实施例中的步骤S730、步骤S740。The above-mentioned information recording module 903 is also configured to record the mapping from the first virtual address to the second physical address and the mapping from the second virtual address to the second physical address in the forward mapping table, for example, execute steps S730, Step S740.
删除模块904用于删除第一物理地址中的第一数据,例如执行上述方法实施例中的步骤S750。The deletion module 904 is configured to delete the first data in the first physical address, for example, execute step S750 in the above method embodiment.
删除模块904还用于删除第一存储空间的元信息表中的第一物理地址和第一物理地址对应的虚拟地址集合中的第一虚拟地址和第二虚拟地址,例如执行上述方法实施例中的步骤S760。The deletion module 904 is further configured to delete the first physical address in the meta information table of the first storage space and the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address, for example, perform Step S760.
删除模块904还用于删除正向映射表中的第一虚拟地址到第一物理地址的映射以及第二虚拟地址到第一物理地址的映射,例如执行上述方法实施例中的步骤S770、步骤S810。The deletion module 904 is also used to delete the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table, for example, execute step S770 and step S810 in the above method embodiment .
上述数据存储装置的各个模块还可以用于执行上述方法实施例中的其他动作,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述, 在此不再赘述。Each module of the above-mentioned data storage device can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be referred to the function description of the corresponding functional module, and will not be repeated here.
在采用集成的单元的情况下,本申请实施例提供的数据存储装置的结构示意图如图10所示。在图10中,数据存储装置包括:处理模块1001和通信模块1002。处理模块1001用于对数据存储装置的动作进行控制管理,例如,执行确定模块901、写数据模块902、信息记录模块903以及删除模块904执行的步骤,和/或用于执行本文所描述的技术的其它过程。通信模块1002用于支持数据存储装置与其他设备之间的交互等,如图10所示,数据存储装置还可以包括存储模块1003,存储模块1003用于存储数据存储装置的程序代码和上述第一数据和第二数据等。In the case of using an integrated unit, a schematic structural diagram of the data storage device provided in the embodiment of the present application is shown in FIG. 10 . In FIG. 10 , the data storage device includes: a processing module 1001 and a communication module 1002 . The processing module 1001 is used to control and manage the actions of the data storage device, for example, to execute the steps performed by the determination module 901, the write data module 902, the information recording module 903 and the deletion module 904, and/or to execute the techniques described herein other processes. The communication module 1002 is used to support the interaction between the data storage device and other devices. As shown in FIG. data and second data etc.
其中,处理模块1001可以是处理器或控制器,例如图2中主控制器201中的处理器202。通信模块1002可以是收发器、RF电路或通信接口等,例如图2中主控制器201中的主机接口204和/或主控制器201中的通道控制器203。存储模块1003可以是存储器,例如图2中的内存芯片205。Wherein, the processing module 1001 may be a processor or a controller, for example, the processor 202 in the main controller 201 in FIG. 2 . The communication module 1002 may be a transceiver, an RF circuit, or a communication interface, etc., such as the host interface 204 in the main controller 201 and/or the channel controller 203 in the main controller 201 in FIG. 2 . The storage module 1003 may be a memory, such as the memory chip 205 in FIG. 2 .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本申请实施例中的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。该计算机指令可以存储在计算机可读硬盘中,或者从一个计算机可读硬盘向另一个计算机可读硬盘传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))方式或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。该计算机可读硬盘可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、磁盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state drives,SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, all or part of the processes or functions according to the embodiments of the present application will be generated. The computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored on or transmitted from one computer-readable disk to another computer-readable disk, for example, the computer instructions may be transmitted from a website site, computer, server, or data center by wire (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable hard disk may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the above-described system, device, and unit, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例 方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to realize the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取硬盘中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个硬盘中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的硬盘包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable hard disk. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a hard disk. Several instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned hard disks include: various media capable of storing program codes such as flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the protection scope of the application is not limited thereto, and any changes or replacements within the technical scope disclosed in the application should be covered within the protection scope of the application . Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (12)

  1. 一种数据存储方法,其特征在于,包括:A data storage method, characterized in that, comprising:
    确定第一数据对应的第一物理地址,所述第一物理地址用于指示第一存储空间中的至少一个存储块;将所述第一数据写入所述第一物理地址;determining a first physical address corresponding to the first data, where the first physical address is used to indicate at least one storage block in the first storage space; writing the first data into the first physical address;
    在正向映射表中记录所述第一虚拟地址到所述第一物理地址的映射,所述第一数据的虚拟地址为第一虚拟地址,所述第一物理地址包括第一存储空间的标识和所述第一物理地址在第一存储空间中的偏移地址;Recording the mapping from the first virtual address to the first physical address in the forward mapping table, the virtual address of the first data is the first virtual address, and the first physical address includes an identifier of the first storage space and an offset address of the first physical address in the first storage space;
    在所述第一存储空间对应的元信息表中记录所述第一物理地址,并且在所述元信息表中记录所述第一物理地址对应的所述第一虚拟地址。Recording the first physical address in a meta information table corresponding to the first storage space, and recording the first virtual address corresponding to the first physical address in the meta information table.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, further comprising:
    确定第二数据与所述第一数据是否相同,所述第二数据为待写入的数据;determining whether the second data is the same as the first data, the second data is data to be written;
    在所述第二数据与所述第一数据相同的情况下,在所述正向映射表中记录第二虚拟地址到所述第一物理地址的映射,所述第二数据的虚拟地址为所述第二虚拟地址;In the case that the second data is the same as the first data, recording the mapping from the second virtual address to the first physical address in the forward mapping table, the virtual address of the second data is the the second virtual address;
    在所述元信息表中记录所述第一物理地址对应的所述第二虚拟地址。Recording the second virtual address corresponding to the first physical address in the meta information table.
  3. 根据权利要求2所述的方法,其特征在于,所述元信息表中包含所述第一物理地址对应的虚拟地址集合;所述虚拟地址集合中添加有所述第一虚拟地址和所述第二虚拟地址。The method according to claim 2, wherein the meta-information table includes a virtual address set corresponding to the first physical address; the first virtual address and the second virtual address are added to the virtual address set Two virtual addresses.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, further comprising:
    当对所述第一存储空间进行垃圾回收时,将所述第一物理地址中的所述第一数据写入第二物理地址;When performing garbage collection on the first storage space, writing the first data in the first physical address into a second physical address;
    在所述第二物理地址所在的第二存储空间对应的元信息表中记录所述第二物理地址,并且在所述第二物理地址对应的虚拟地址集合中添加所述第一虚拟地址和所述第二虚拟地址;Record the second physical address in the meta information table corresponding to the second storage space where the second physical address is located, and add the first virtual address and the set of virtual addresses corresponding to the second physical address the second virtual address;
    在所述正向映射表中记录所述第一虚拟地址到所述第二物理地址的映射以及所述第二虚拟地址到所述第二物理地址的映射;Recording the mapping from the first virtual address to the second physical address and the mapping from the second virtual address to the second physical address in the forward mapping table;
    删除所述第一物理地址中的所述第一数据;deleting the first data in the first physical address;
    删除所述第一存储空间的元信息表中的所述第一物理地址和所述第一物理地址对应的虚拟地址集合中的所述第一虚拟地址和所述第二虚拟地址;deleting the first virtual address and the second virtual address in the virtual address set corresponding to the first physical address in the meta information table of the first storage space and the first physical address;
    删除所述正向映射表中的所述第一虚拟地址到所述第一物理地址的映射以及所述第二虚拟地址到所述第一物理地址的映射。deleting the mapping from the first virtual address to the first physical address and the mapping from the second virtual address to the first physical address in the forward mapping table.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,The method according to any one of claims 1-4, characterized in that,
    存储空间是持久化日志存储空间,所述持久化日志存储空间支持以追加写的方式写入数据。The storage space is a persistent log storage space, and the persistent log storage space supports writing data in an append writing manner.
  6. 一种数据存储装置,其特征在于,包括:确定模块、写数据模块和信息记录模块;A data storage device, characterized by comprising: a determination module, a data writing module and an information recording module;
    所述确定模块,用于确定第一数据对应的第一物理地址,所述第一物理地址用于指示第一存储空间中的至少一个存储块;The determination module is configured to determine a first physical address corresponding to the first data, and the first physical address is used to indicate at least one storage block in the first storage space;
    所述写数据模块,用于将所述第一数据写入所述第一物理地址;The data writing module is configured to write the first data into the first physical address;
    所述信息记录模块,用于在正向映射表中记录所述第一虚拟地址到所述第一物理地址的映射,其中,所述第一数据的虚拟地址为第一虚拟地址,所述第一物理地址包括第一存储空间的标识和所述第一物理地址在所述第一存储空间中的偏移地址;The information recording module is configured to record the mapping from the first virtual address to the first physical address in a forward mapping table, wherein the virtual address of the first data is the first virtual address, and the first A physical address includes an identifier of the first storage space and an offset address of the first physical address in the first storage space;
    所述信息记录模块,还用于在所述第一存储空间对应的元信息表中记录所述第一物理地址,并且在所述元信息表中记录所述第一物理地址对应的所述第一虚拟地址。The information recording module is further configured to record the first physical address in the meta information table corresponding to the first storage space, and record the first physical address corresponding to the first physical address in the meta information table. a virtual address.
  7. 根据权利要求6所述的数据存储装置,其特征在于,The data storage device according to claim 6, characterized in that,
    所述确定模块,还用于确定第二数据与所述第一数据是否相同,所述第二数据为待写入的数据;The determination module is also used to determine whether the second data is the same as the first data, and the second data is data to be written;
    所述信息记录模块,还用于在所述第二数据与所述第一数据相同的情况下,在所述正向映射表中记录所述第二虚拟地址到所述第一物理地址的映射;The information recording module is further configured to record the mapping from the second virtual address to the first physical address in the forward mapping table when the second data is the same as the first data ;
    所述信息记录模块,还用于在所述元信息表中记录所述第一物理地址对应的所述第二虚拟地址。The information recording module is further configured to record the second virtual address corresponding to the first physical address in the meta information table.
  8. 根据权利要求7所述的数据存储装置,其特征在于,所述元信息表中包含所述第一物理地址对应的虚拟地址集合;所述虚拟地址集合中添加有所述第一虚拟地址和所述第二虚拟地址。The data storage device according to claim 7, wherein the meta information table includes a virtual address set corresponding to the first physical address; the virtual address set is added with the first virtual address and the the second virtual address.
  9. 根据权利要求8所述的数据存储装置,其特征在于,所述数据存储装置还包括:删除模块;The data storage device according to claim 8, further comprising: a deletion module;
    所述写数据模块,还用于当对所述第一存储空间进行垃圾回收时,将所述第一物理地址中的所述第一数据写入第二物理地址;The data writing module is further configured to write the first data in the first physical address into a second physical address when performing garbage collection on the first storage space;
    所述信息记录模块,还用于在所述第二物理地址所在的第二存储空间对应的元信息表中记录所述第二物理地址,并且在所述第二物理地址对应的虚拟地址集合中添加所述第一虚拟地址和所述第二虚拟地址;The information recording module is further configured to record the second physical address in the meta information table corresponding to the second storage space where the second physical address is located, and record the second physical address in the virtual address set corresponding to the second physical address adding said first virtual address and said second virtual address;
    所述信息记录模块,还用于在所述正向映射表中记录所述第一虚拟地址到所述第二物理地址的映射以及所述第二虚拟地址与所述第二物理地址的映射;The information recording module is further configured to record the mapping from the first virtual address to the second physical address and the mapping from the second virtual address to the second physical address in the forward mapping table;
    所述删除模块,用于删除所述第一物理地址中的所述第一数据,并且删除所述第一存储空间的元信息表中的所述第一物理地址和所述第一物理地址对应的虚拟地址集合中的所述第一虚拟地址和所述第二虚拟地址;以及删除所述正向映射表中的所述第一虚拟地址到所述第一物理地址的映射以及所述第二虚拟地址到所述第一物理地址的映射。The deletion module is configured to delete the first data in the first physical address, and delete the correspondence between the first physical address and the first physical address in the meta information table of the first storage space the first virtual address and the second virtual address in the virtual address set; and delete the mapping from the first virtual address to the first physical address and the second virtual address in the forward mapping table A mapping from a virtual address to the first physical address.
  10. 根据权利要求6-9任一项所述的数据存储装置,其特征在于,The data storage device according to any one of claims 6-9, characterized in that,
    存储空间是持久化日志存储空间,所述持久化日志存储空间支持以追加写的方式写入数据。The storage space is a persistent log storage space, and the persistent log storage space supports writing data in an append writing manner.
  11. 一种存储设备,其特征在于,包括存储器和处理器,所述存储器与所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;当所述计算机指令被所述处理器执行时,使得所述处理器执行如权利要求1至5中任一项所述的方法。A storage device, characterized in that it includes a memory and a processor, the memory is coupled to the processor; the memory is used to store computer program codes, and the computer program codes include computer instructions; when the computer instructions are When the processor is executed, the processor is made to execute the method according to any one of claims 1-5.
  12. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在计算设备上运行时,使得所述计算设备执行如权利要求1至5中任一项所述的方法。A computer storage medium, characterized by comprising computer instructions, and when the computer instructions are run on a computing device, the computing device is made to execute the method according to any one of claims 1 to 5.
PCT/CN2022/078858 2021-08-31 2022-03-02 Data storage method and device WO2023029417A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111017305.X 2021-08-31
CN202111017305.XA CN115729846A (en) 2021-08-31 2021-08-31 Data storage method and device

Publications (1)

Publication Number Publication Date
WO2023029417A1 true WO2023029417A1 (en) 2023-03-09

Family

ID=85291861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078858 WO2023029417A1 (en) 2021-08-31 2022-03-02 Data storage method and device

Country Status (2)

Country Link
CN (1) CN115729846A (en)
WO (1) WO2023029417A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102084330A (en) * 2008-04-06 2011-06-01 弗森-艾奥公司 Apparatus, system, and method for efficient mapping of virtual and physical addresses
US20130346792A1 (en) * 2012-06-22 2013-12-26 International Business Machines Corporation Resolving memory faults with reduced processing impact
CN110399310A (en) * 2018-04-18 2019-11-01 杭州宏杉科技股份有限公司 A kind of recovery method and device of memory space
CN111367856A (en) * 2020-02-28 2020-07-03 杭州宏杉科技股份有限公司 Data copying method and device, electronic equipment and machine-readable storage medium
CN113094003A (en) * 2021-05-12 2021-07-09 湖南国科微电子股份有限公司 Data processing method, data storage device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102084330A (en) * 2008-04-06 2011-06-01 弗森-艾奥公司 Apparatus, system, and method for efficient mapping of virtual and physical addresses
US20130346792A1 (en) * 2012-06-22 2013-12-26 International Business Machines Corporation Resolving memory faults with reduced processing impact
CN110399310A (en) * 2018-04-18 2019-11-01 杭州宏杉科技股份有限公司 A kind of recovery method and device of memory space
CN111367856A (en) * 2020-02-28 2020-07-03 杭州宏杉科技股份有限公司 Data copying method and device, electronic equipment and machine-readable storage medium
CN113094003A (en) * 2021-05-12 2021-07-09 湖南国科微电子股份有限公司 Data processing method, data storage device and electronic equipment

Also Published As

Publication number Publication date
CN115729846A (en) 2023-03-03

Similar Documents

Publication Publication Date Title
US10289304B2 (en) Physical address management in solid state memory by tracking pending reads therefrom
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
US7970919B1 (en) Apparatus and system for object-based storage solid-state drive and method for configuring same
US20150262632A1 (en) Grouping storage ports based on distance
US9182912B2 (en) Method to allow storage cache acceleration when the slow tier is on independent controller
US10037161B2 (en) Tiered storage system, storage controller, and method for deduplication and storage tiering
US7127583B2 (en) Disk control system and control method of disk control system
WO2017025039A1 (en) Flash storage oriented data access method and device
US9430492B1 (en) Efficient scavenging of data and metadata file system blocks
WO2018171296A1 (en) File merging method and controller
WO2023035646A1 (en) Method and apparatus for expanding memory, and related device
US20220164316A1 (en) Deduplication method and apparatus
CN111158602A (en) Data layered storage method, data reading method, storage host and storage system
US20220164145A1 (en) Apparatus and system for object-based storage solid-state device
US10346077B2 (en) Region-integrated data deduplication
US20240086113A1 (en) Synchronous write method and device, storage system and electronic device
WO2023029417A1 (en) Data storage method and device
WO2022257685A1 (en) Storage system, network interface card, processor, and data access method, apparatus, and system
WO2023050856A1 (en) Data processing method and storage system
US11947419B2 (en) Storage device with data deduplication, operation method of storage device, and operation method of storage server
WO2023065654A1 (en) Data writing method and related device
WO2016029481A1 (en) Method and device for isolating disk regions
US20060277326A1 (en) Data transfer system and method
US20210311654A1 (en) Distributed Storage System and Computer Program Product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862566

Country of ref document: EP

Kind code of ref document: A1