WO2023245942A1

WO2023245942A1 - Ssd finite window data deduplication identification method and apparatus, and computer device

Info

Publication number: WO2023245942A1
Application number: PCT/CN2022/127731
Authority: WO
Inventors: 王猛; 徐伟华; 韩道静
Original assignee: 苏州忆联信息系统有限公司
Priority date: 2022-06-24
Filing date: 2022-10-26
Publication date: 2023-12-28
Also published as: CN114974365A

Abstract

An SSD finite window data deduplication identification method and apparatus, a computer device, and a storage medium. The method comprises: if a command is a write command, allocating a write buffer area and receiving data from a host, dividing the write command according to data segments, and calculating a data abstract for each data segment; determining whether the data abstract for each data segment of the write command is consistent with the data abstract in a read data deduplication window; if yes, further determining whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window; and if yes, sending a corresponding read data segment logical address and a write data segment logical address to a write deduplication application module, so that the write deduplication application module replaces the manner of writing write data to NAND with a mapping table copying manner. The present application can reduce data writing, improve the SSD reliability, and greatly improve the performance of on-disk data copying.

Description

SSD limited window data deduplication identification method, device and computer equipment

This application claims priority to the Chinese patent application filed with the China Patent Office on June 24, 2022, with the application number 202210731384.9 and the invention title "SSD limited window data deduplication identification method, device and computer equipment", all of which The contents are incorporated into this application by reference.

Technical field

The present application relates to the technical field of solid state drives, and in particular to an SSD limited window data deduplication identification method, device, computer equipment and storage medium.

Background technique

With the development of solid-state drive technology, SSD (solid-state drive) has been widely used in various occasions. Currently, it has gradually replaced traditional HDD in the PC market, providing users with a better experience in terms of reliability and performance. SSD internally uses NAND as the data storage medium. The host accesses the SSD based on the logical address (LBA), and the SSD internally maintains a logical to physical address mapping table (L2P table) to indicate the physical address where the corresponding logical address data is stored. When data is written, a physical storage address is assigned to the logical address data written by the host, and the corresponding L2P table is updated. When reading data, the physical address corresponding to the L2P table is queried according to the logical address, and then the data is read and returned to the host.

Currently, there is a lot of duplicate data in existing SSDs. The corresponding logical addresses and physical addresses are different, but the data content is the same, which requires additional NAND erase and write life. When a typical host copies a file, the data needs to be read from the source logical address to the host, and then written to a new logical address (destination address). There is a lot of duplicate data within a limited time window. During this process, data needs to be read/written from NAND multiple times, and the data needs to pass through the bus multiple times, which greatly limits performance. Furthermore, the writing of such repeated data consumes additional NAND erase and write, and consumes the life of the SSD.

technical problem

Based on this, it is necessary to provide an SSD limited window data deduplication identification method, device, computer equipment and storage medium to address the above technical problems.

Technical solutions

An SSD limited window data deduplication and identification method, the method includes:

Obtain the command issued by the host and determine whether the command is a read command;

If the command is a read command, the data reading is completed according to the normal path, the read command is divided into data segments, and the data summary is calculated for each segment of data;

Add the read data segment logical address, data summary and data content to the read deduplication queue, move the read data deduplication window and keep the number of records in the window constant;

If the command is not a read command, continue to determine whether the command is a write command;

If the command is a write command, allocate a write buffer and receive data from the host, divide the write command into data segments, and calculate its data summary for each segment of data;

Determine whether the data summary of each data segment of the write command is consistent with the data summary in the read data deduplication window;

If the data digests are consistent, it is further determined whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window;

If the data content is also consistent, the corresponding read data segment logical address and write data segment logical address are sent to the write deduplication application module. The write deduplication application module replaces the write data into NAND by copying the mapping table.

In one embodiment, the step of dividing the read command into data segments, and calculating the data summary for each segment of data further includes:

Divide the read command into multiple data segments according to a certain size. Each data segment contains the read data segment logical address, data summary and data content;

Maintain the read data segments recently accessed by the host from far to near according to time to form a read deduplication queue.

In one embodiment, after the step of maintaining the read data segments recently accessed by the host from far to near in time and forming a read deduplication queue, the step further includes:

In the read data deduplication window, only a certain number of recently accessed read data segments are kept, and further records are discarded to reduce resource overhead and hit check costs.

In one embodiment, if the data digests are consistent, the step of further determining whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window further includes:

If the data digests are consistent, XOR the data content of the write data segment with the data content of the hit read data segment and determine whether the result is 0;

If the result is 0, it is judged that the data is duplicated, and the logical address of the write data end and the logical address of the hit read data segment are sent to the write deduplication application module. The write deduplication application module copies the mapping table of the read data segment logical address to the write data segment. Mapping table of logical addresses to complete data copy.

An SSD limited window data deduplication and identification device. The SSD limited window data deduplication and identification device includes:

A first judgment module, the first judgment module is used to obtain the command issued by the host and judge whether the command is a read command;

Read deduplication window module, the read deduplication window module is used to complete data reading according to the normal path if the command is a read command, divide the read command according to data segments, and calculate its data summary for each segment of data; Add the read data segment logical address, data summary and data content to the read deduplication queue, move the read data deduplication window and keep the number of records in the window constant;

a second judgment module, the second judgment module is used to continue to judge whether the command is a write command if the command is not a read command;

Write deduplication identification module, the write deduplication identification module is used to allocate a write buffer and receive data from the host if the command is a write command, divide the write command according to data segments, and calculate the data for each segment of data Summary;

A third judgment module, the third judgment module is used to judge whether the data summary of each data segment of the write command is consistent with the data summary in the read data deduplication window;

The fourth judgment module is used to further judge whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window if the data digest is consistent; if the data content is also consistent, then Send the corresponding read data segment logical address and write data segment logical address to the write deduplication application module;

A write deduplication application module, which is used to write data into NAND by copying a mapping table instead of writing data.

In one embodiment, the read deduplication window module is also used to:

In one embodiment, the fourth judgment module is also used to:

A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the steps of any one of the above methods are implemented.

A computer-readable storage medium on which a computer program is stored, which implements the steps of any of the above methods when executed by a processor.

beneficial effects

In the above-mentioned SSD limited window data deduplication identification method, device, computer equipment and storage medium, the SSD internally maintains a read data deduplication window of a certain depth, including the logical address, data summary and data content of the recently read data segment. When the host writes data, the SSD internally generates a data summary for the written data segment in real time, and compares the summary of the written data segment with the data summary in the current read data deduplication window. For data summary matching scenarios, by further comparing the corresponding data content, the SSD can independently identify duplicate data within a limited window, and then use some mapping table copying and other methods to reduce data writing and improve SSD reliability. At the same time, the performance of intra-disk data copying is greatly improved.

Description of the drawings

Figure 1 is a schematic diagram of a typical SSD reading and writing process in traditional technology;

Figure 2 is a schematic flow chart of host file data copying in traditional technology;

Figure 3 is a schematic flowchart of the SSD limited window data deduplication and identification method in one embodiment;

Figure 4 is a schematic flowchart of an SSD limited window data deduplication and identification method in another embodiment;

Figure 5 is a schematic diagram of a limited window deduplication module introduced in one embodiment;

Figure 6 is a schematic diagram of read deduplication window maintenance in one embodiment;

Figure 7 is a schematic diagram of the specific implementation of the SSD limited window data deduplication and identification method in one embodiment;

Figure 8 is a structural block diagram of an SSD limited window data deduplication and identification device in one embodiment;

Figure 9 is an internal structure diagram of a computer device in one embodiment.

Embodiments of the invention

In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

Refer to the typical SSD read and write process shown in Figure 1, which specifically includes the following implementation process: the host sends commands to the SSD hardware module; the SSD hardware module PCIe/NVMe, after receiving the command, transfers it to the firmware module for processing; the SSD firmware front-end module Divide the command into mapping units (typically 4KB); submit the operation request to the buffer management module and allocate a read and write buffer; if it is a write command, establish data transmission with the host based on the allocated buffer, and complete After data transmission, the host is notified that the command is completed; if it is a read command, the operation request is submitted to the mapping table management module; the mapping table management module is responsible for allocating the corresponding physical address (write command) according to the logical address or converting the logical address into a NAND physical address ( Read command); submit the operation request to the back-end module, and the back-end module initiates a NAND read/write request according to the physical address; waits for the NAND read/write operation request to be completed; if it is a read command, after the data is Ready, start the data from the NAND Cache Register to host transmission.

Refer to the host file data copy process shown in Figure 2, which specifically includes the following implementation process:

S0: Get the logical addresses of the source and destination data to be copied.

S1: The host reads source data segment 1.

S2: SSD converts LBA to LPA; queries the physical address of the data to be read; initiates reading of the corresponding physical address.

S3: SSD returns data to the host.

S4: The host writes data segment 1 to the destination address.

S5: Convert LBA to LPA; allocate the physical address of the data to be written and update the mapping table; initiate writing of the corresponding physical address.

Repeat S1-S5 until all data segments have been copied.

In the above process, the source data needs to be read out from the SSD/NAND in sequence, transmitted to the host memory through the bus (PCIe), and then transmitted to the SSD through the bus, and then written to the NAND. There are many operations involved, which is extremely difficult. This greatly limits the performance of data copying; at the same time, repeated data is written to NAND, which consumes the life of the SSD.

Based on this, this application proposes an SSD limited window data deduplication and identification method, which aims to identify the characteristics of this type of repeated data, so as to optimize the performance of the SSD and reduce the amount of writing to the SSD.

In one embodiment, as shown in Figure 3, an SSD limited window data deduplication and identification method is provided, which method includes:

Step 302: Obtain the command issued by the host and determine whether the command is a read command;

Step 304, if the command is a read command, complete the data reading according to the normal path, divide the read command into data segments, and calculate its data summary for each segment of data;

Step 306: Add the read data segment logical address, data summary and data content to the read deduplication queue, move the read data deduplication window and keep the number of records in the window constant;

Step 308: If the command is not a read command, continue to determine whether the command is a write command;

Step 310, if the command is a write command, allocate a write buffer and receive data from the host, divide the write command into data segments, and calculate its data summary for each segment of data;

Step 312, determine whether the data digest of each data segment of the write command is consistent with the data digest in the read data deduplication window;

Step 314, if the data digests are consistent, further determine whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window;

Step 316, if the data content is also consistent, the corresponding read data segment logical address and write data segment logical address are sent to the write deduplication application module. The write deduplication application module replaces the write data into NAND by copying the mapping table.

In this embodiment, a method for deduplication and identification of SSD limited window data is provided, which can effectively identify data duplication in such limited command window scenarios, providing feasibility for further optimizing performance and improving lifespan. Specifically, it includes the following steps:

First, obtain the command issued by the host and determine whether the command is a read command; if the command is a read command, complete the data reading according to the normal path, divide the read command into data segments, and calculate the data summary for each segment of data. .

Refer to the schematic diagram of the limited window deduplication module shown in Figure 5. A deduplication module is added to the command processing path, which is subdivided into three sub-modules: read deduplication window, write deduplication identification, and write deduplication application:

The read deduplication window is used to maintain the information of the latest read data segment inside the SSD: logical address, data summary, and data content to match the newly written data segment.

Write deduplication identification is used to generate a data summary for the data segment newly written by the host, and compare it with the record saved in the read deduplication window queue to determine whether the summary is the same; for scenarios where the summary is the same, further comparison is performed Its data, confirm whether the data is completely consistent; for write commands whose summary and data content comparison are consistent, send its logical address and the logical address information of the hit read data segment to the write deduplication application module.

The write deduplication application module is used to copy the mapping table of the read logical address to the mapping table of the write logical address according to the received logical addresses of the write data segment and the read data segment to complete the data copy.

In this embodiment, the SSD internally maintains a read data deduplication window of a certain depth, including the logical address, data summary and data content of the recently read data segment. When the host writes data, the SSD internally generates a data summary for the written data segment in real time, and compares the summary of the written data segment with the data summary in the current read data deduplication window. For data summary matching scenarios, by further comparing the corresponding data content, the SSD can independently identify duplicate data within a limited window, and then use some mapping table copying and other methods to reduce data writing and improve SSD reliability. At the same time, the performance of intra-disk data copying is greatly improved.

In one embodiment, as shown in Figure 4, an SSD limited window data deduplication and identification method is provided. In this method, the read command is divided into data segments, and the step of calculating the data summary for each segment of data also includes:

Step 402: Divide the read command into multiple data segments according to a certain size. Each data segment contains the read data segment logical address, data summary and data content;

Step 404: Maintain the read data segments recently accessed by the host from far to near according to time to form a read deduplication queue;

Step 406: In the read data deduplication window, only a certain number of recently accessed read data segments are kept, and further records are discarded to reduce resource overhead and hit check costs.

Referring to the schematic diagram of read deduplication window maintenance shown in Figure 6, specifically, the read command is first divided into data segments according to a certain size (customized, it can be 4KB/8KB/...), and each data segment contains the following information:

Read the logical address of the data segment, corresponding to the starting logical address (LBA) of the data segment;

Data summary, the data summary corresponding to the data segment, can use common algorithms, such as SHA/MD5..., for rough comparison (fast, but there is a certain probability of misjudgment);

Data content, the original content of the data segment, used for accurate comparison (slower speed).

Then, from far to near in time, the read data segments accessed by the latest host are maintained to form a read deduplication queue.

Finally, in the read data deduplication window, only a certain number of recently accessed read data segments are kept (the number can be customized, such as 4/8/16...), and further records are discarded, reducing resource overhead and hit checking. cost.

In this embodiment, in a typical data copy scenario, N pieces of data are generally read and N pieces of data are written, so the write command will basically hit the latest N data segments of the read command. According to this N, the size of the read deduplication window (number of data segments) can be adjusted to improve efficiency.

In one embodiment, an SSD limited window data deduplication and identification method is provided. The method also includes: if the data digests are consistent, XOR the data content of the write data segment with the data content of the hit read data segment and determine Whether the result is 0; if the result is 0, it is judged that the data is repeated, and the logical address of the write data end and the logical address of the hit read data segment are sent to the write deduplication application module. The write deduplication application module will read the mapping table of the data segment logical address. Copy to the mapping table of the logical address of the write data segment to complete the data copy.

In this embodiment, first, the summary of the written data segment is compared with the data summary in the current read data deduplication window. If they match, further data matching is performed. For data digest matching scenarios, the corresponding read and write data are further XORed. If the result is 0, the data is confirmed to be duplicated.

A specific example will be used to illustrate the following. With reference to the specific execution flow diagram shown in Figure 7, the method includes:

Step 7.1. Obtain new commands from the host.

Step 7.2: Determine whether it is a read command; if so, continue to step 7.3; if not, jump to step 7.7.

Step 7.3. Complete data reading according to the normal path.

Step 7.4: Divide the read command into data segments, and calculate the data summary for each segment of data.

Step 7.5: Add (data segment logical address, data summary, data content) to the read deduplication queue, and move the read data deduplication window to keep the number of records in the window constant.

Step 7.6, the command is completed.

Step 7.7: Determine whether it is a write command; if so, proceed to step 7.8; if not, proceed according to the normal path.

Step 7.8. Allocate write buffer and receive data from host.

Step 7.9: Divide the write command into data segments and calculate its data summary for each segment of data.

Step 7.10: Compare the data summary of each data segment of the write command with the data summary in the read data deduplication window.

Step 7.11, determine whether they are equal; if yes, proceed to step 7.12; if not, write the data to NAND according to the normal path.

Step 7.12: XOR the content of the write data segment with the data content of the hit read data segment.

Step 7.13. Determine whether the result is 0; if so, proceed to step 7.14; if not, write the data to NAND according to the normal path.

Step 7.14. Data is repeated and the corresponding read data segment logical address and write data segment logical address are sent to the write deduplication application module.

Step 7.15: The write deduplication application module writes the data into NAND by copying the mapping table based on the logical addresses of the read and write data segments.

Step 7.16, the command is completed.

It should be understood that although the various steps in the flowcharts of Figures 1-7 are shown in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1-7 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The order of execution is not necessarily sequential, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.

In one embodiment, as shown in Figure 8, an SSD limited window data deduplication identification device 800 is provided, which device includes:

The first judgment module 801 is used to obtain the command issued by the host and judge whether the command is a read command;

Read deduplication window module 802. The read deduplication window module is used to complete data reading according to the normal path if the command is a read command, divide the read command according to data segments, and calculate its data summary for each segment of data. ;Add the read data segment logical address, data summary and data content to the read deduplication queue, move the read data deduplication window and keep the number of records in the window constant;

The second judgment module 803 is used to continue to judge whether the command is a write command if the command is not a read command;

Write deduplication identification module 804. The write deduplication identification module is used to allocate a write buffer and receive data from the host if the command is a write command, divide the write command according to data segments, and calculate its value for each segment of data. data summary;

The third judgment module 805 is used to judge whether the data digest of each data segment of the write command is consistent with the data digest in the read data deduplication window;

The fourth judgment module 806 is used to further judge whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window if the data digest is consistent; if the data content is also consistent, Then send the corresponding read data segment logical address and write data segment logical address to the write deduplication application module;

Write deduplication application module 807. The write deduplication application module is used to write data into NAND by copying the mapping table instead of writing data.

In one embodiment, the read deduplication window module 802 is also used to:

In one embodiment, the fourth judgment module 806 is also used to:

Regarding the specific limitations of the SSD limited window data deduplication identification device, please refer to the above limitations on the SSD limited window data deduplication identification method, which will not be described again here.

In one embodiment, a computer device is provided, the internal structure diagram of which can be shown in Figure 9 . The computer device includes a processor, memory, and network interface connected through a system bus. Wherein, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer program is executed by a processor to implement an SSD limited window data deduplication and identification method.

Those skilled in the art can understand that the structure shown in Figure 9 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the steps in each of the above method embodiments are implemented.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps in each of the above method embodiments are implemented.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer-readable storage. In the medium, when the computer program is executed, it may include the processes of the embodiments of the above methods. Any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.

The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

An SSD limited window data deduplication and identification method, the method includes:

Obtain the command issued by the host and determine whether the command is a read command;

If the command is a read command, the data reading is completed according to the normal path, the read command is divided into data segments, and the data summary is calculated for each segment of data;

Add the read data segment logical address, data summary and data content to the read deduplication queue, move the read data deduplication window and keep the number of records in the window constant;

If the command is not a read command, continue to determine whether the command is a write command;

If the command is a write command, allocate a write buffer and receive data from the host, divide the write command into data segments, and calculate its data summary for each segment of data;

Determine whether the data summary of each data segment of the write command is consistent with the data summary in the read data deduplication window;

If the data digests are consistent, it is further determined whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window;

If the data content is also consistent, the corresponding read data segment logical address and write data segment logical address are sent to the write deduplication application module. The write deduplication application module replaces the write data into NAND by copying the mapping table.
The SSD limited window data deduplication and identification method according to claim 1, characterized in that the step of dividing the read command according to data segments, and calculating the data summary for each segment of data also includes:

Divide the read command into multiple data segments according to a certain size. Each data segment contains the read data segment logical address, data summary and data content;

Maintain the read data segments recently accessed by the host from far to near according to time to form a read deduplication queue.
The SSD limited window data deduplication identification method according to claim 2, characterized in that after the step of maintaining read data segments recently accessed by the host from far to near according to time and forming a read deduplication queue, it also includes:

In the read data deduplication window, only a certain number of recently accessed read data segments are kept, and further records are discarded to reduce resource overhead and hit check costs.
The SSD limited window data deduplication identification method according to any one of claims 1 to 3, characterized in that if the data digest is consistent, it is further determined that the data content of each data segment of the write command is consistent with the read data deduplication window. The steps to ensure that the data content in are consistent include:

If the data digests are consistent, XOR the data content of the write data segment with the data content of the hit read data segment and determine whether the result is 0;

If the result is 0, it is judged that the data is duplicated, and the logical address of the write data end and the logical address of the hit read data segment are sent to the write deduplication application module. The write deduplication application module copies the mapping table of the read data segment logical address to the write data segment. Mapping table of logical addresses to complete data copy.
An SSD limited window data deduplication and identification device, characterized in that the SSD limited window data deduplication and identification device includes:

A first judgment module, the first judgment module is used to obtain the command issued by the host and judge whether the command is a read command;

Read deduplication window module, the read deduplication window module is used to complete data reading according to the normal path if the command is a read command, divide the read command according to data segments, and calculate its data summary for each segment of data; Add the read data segment logical address, data summary and data content to the read deduplication queue, move the read data deduplication window and keep the number of records in the window constant;

a second judgment module, the second judgment module is used to continue to judge whether the command is a write command if the command is not a read command;

Write deduplication identification module, the write deduplication identification module is used to allocate a write buffer and receive data from the host if the command is a write command, divide the write command according to data segments, and calculate the data for each segment of data Summary;

A third judgment module, the third judgment module is used to judge whether the data summary of each data segment of the write command is consistent with the data summary in the read data deduplication window;

The fourth judgment module is used to further judge whether the data content of each data segment of the write command is consistent with the data content in the read data deduplication window if the data digest is consistent; if the data content is also consistent, then Send the corresponding read data segment logical address and write data segment logical address to the write deduplication application module;

A write deduplication application module, which is used to write data into NAND by copying a mapping table instead of writing data.
The SSD limited window data deduplication identification device according to claim 5, characterized in that the read deduplication window module is also used to:

Divide the read command into multiple data segments according to a certain size. Each data segment contains the read data segment logical address, data summary and data content;

Maintain the read data segments recently accessed by the host from far to near according to time to form a read deduplication queue.
The SSD limited window data deduplication identification device according to claim 6, characterized in that the read deduplication window module is also used to:

In the read data deduplication window, only a certain number of recently accessed read data segments are kept, and further records are discarded to reduce resource overhead and hit check costs.
The SSD limited window data deduplication and identification device according to any one of claims 5 to 7, characterized in that the fourth judgment module is also used to:

If the data digests are consistent, XOR the data content of the write data segment with the data content of the hit read data segment and determine whether the result is 0;

If the result is 0, it is judged that the data is duplicated, and the logical address of the write data end and the logical address of the hit read data segment are sent to the write deduplication application module. The write deduplication application module copies the mapping table of the read data segment logical address to the write data segment. Mapping table of logical addresses to complete data copy.
A computer device, including a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, any one of claims 1 to 4 is implemented The steps of the method.
A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the steps of the method described in any one of claims 1 to 4 are implemented.