CN109240605B

CN109240605B - Rapid repeated data block identification method based on 3D stacked memory

Info

Publication number: CN109240605B
Application number: CN201810937496.3A
Authority: CN
Inventors: 曾令仿; 程稳; 蔡苒; 李春艳; 桑大邹; 王芳; 冯丹
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2020-05-19
Anticipated expiration: 2038-08-17
Also published as: CN109240605A

Abstract

The invention discloses a method for identifying quick repeated data blocks based on a 3D stacked memory, which comprises the following steps: sending the data block fingerprints to a 3D stacked memory; completing retrieval and storage of data block fingerprints in a 3D stacked memory; and the 3D stacked memory returns the fingerprint retrieval result to the CPU. According to the invention, the 3D stacked memory is used for storing the data block fingerprints, the 3D stacked memory is formed by stacking a plurality of DRAM chips and a logic layer chip and is connected through the TSV technology, the logic layer accesses the memory layer through the TSV without passing through a data bus, the speed is high, unnecessary data movement on the bus is avoided, and the time for accessing the memory is reduced; the invention classifies the data fingerprints, divides the storage chip into partitions, stores each type of fingerprint in one partition, embeds a plurality of computing units and one route in a logic layer, and forwards the data fingerprints to each computing unit by the route, thereby avoiding the communication overhead between the computing units and reducing the energy consumption in the fingerprint searching process.

Description

Rapid repeated data block identification method based on 3D stacked memory

Technical Field

The invention belongs to the technical field of computer storage, and particularly relates to a method for identifying quick repeated data blocks based on a 3D stacked memory.

Background

Data deduplication is a redundant data elimination technology, which can effectively reduce the storage scale of data, save storage space, and reduce energy consumption of a data center. The deduplication process comprises links such as data blocking, fingerprint calculation and fingerprint retrieval, wherein the fingerprint retrieval judges whether corresponding data blocks are repeated or not by searching whether fingerprints exist in the index or not, and is one of key steps in the deduplication process.

In a mass data storage system, index access can only be performed in a main memory, but the number of fingerprint indexes generated by mass data is huge, and a part of the indexes have to be stored in a hard disk. In order to increase the speed of fingerprint retrieval, researchers have reduced the access to the slow hard disk by utilizing the characteristics of the data set to increase the hit rate of index data in the memory. However, in current computer architectures, programs and data are stored in memory, the processor and memory are separate, and thus when retrieving a block fingerprint, the fingerprint needs to be constantly moved between the CPU and memory over the bus. The time overhead and energy consumption generated in the process influence the identification speed of the repeated data blocks; the performance difference between the speed of the processor and the transmission rate of the memory increases exponentially, the processor needs to spend more and more time waiting for acquiring data from the memory, the delay is inevitable, and the above problems all affect the transmission efficiency of the fingerprint.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the technical problems of time overhead and energy consumption caused by continuous movement of fingerprints in the fingerprint retrieval process in the prior art.

To achieve the above object, in a first aspect, an embodiment of the present invention provides a method for identifying a duplicate data block based on a 3D stacked memory, where the method includes:

(1) sending the data block fingerprints to a 3D stacked memory;

(2) completing retrieval and storage of data block fingerprints in a 3D stacked memory;

(3) and the 3D stacked memory returns the fingerprint retrieval result to the CPU.

Specifically, the 3D stacked memory includes a logic layer and a storage layer, the logic layer includes a route and a plurality of PEs, and the storage layer is divided into a plurality of channels.

Specifically, the step (1) specifically includes:

(1.1) the CPU reads the data block fingerprint from the cache;

(1.2) sending the data block fingerprints to a route of a logic layer through a bus;

(1.3) processing the data block fingerprint as hexadecimal character strings, wherein the initial character of each character string is one of '0' -9 'and' A '-F'.

Specifically, the step (2) specifically includes:

(2.1) the route forwards the data block fingerprint to a corresponding PE according to the first character of the data block fingerprint;

(2.2) the PE inserts the data block fingerprint into the tail of a request queue of the PE, takes out the fingerprint h from the head of the request queue and sends the fingerprint h to an arithmetic unit and a comparator;

(2.3) the arithmetic unit takes the fingerprint h as a key to execute Hash calculation to obtain the storage position of the fingerprint h in the Hash table, and sends the storage address to the memory controller;

(2.4) the memory controller reads out the data g of the storage address from the storage layer channel, puts the data g into a buffer zone and simultaneously sends the data g into a comparator;

(2.5) the comparator compares the fingerprint h with the fingerprint of the data g, and the PE determines the type of the command to be sent to the memory controller according to the comparison result;

(2.6) the memory controller decides whether to write the fingerprint h into the storage layer or not according to the type of the received command.

Specifically, the step (2.6) specifically includes:

if the fingerprint is the same as the data block, the fingerprint is determined to exist, and the data block is a repeated data block and does not need to be stored; if the data blocks are different, the fingerprint does not exist, the data blocks are new data blocks, and the fingerprint needs to be inserted into the hash table and stored into the corresponding storage layer channel.

Specifically, the step (3) specifically includes:

(3.1) the PE sends the fingerprint comparison result to the route;

(3.2) the route returns the fingerprint comparison result to the CPU;

and (3.3) the CPU carries out corresponding processing on the data block according to the result, if the data block is a new data block, the data block is stored in a magnetic disk, and if the data block is a repeated data block, no processing is carried out.

In order to achieve the above object, in a second aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for identifying a duplicate data block based on a 3D stacked memory is implemented.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

(1) according to the invention, the 3D stacked memory is used for storing the data block fingerprints, the 3D stacked memory is formed by stacking a plurality of DRAM chips and a logic layer chip and is connected through the TSV technology, the logic layer accesses the memory layer through the TSV without passing through a data bus, the speed is high, unnecessary data movement on the bus is avoided, and the time for accessing the memory is reduced.

(2) The invention classifies the data fingerprints, divides the storage chip into partitions, stores each type of fingerprint in one partition, embeds a plurality of computing units and one route in a logic layer, and forwards the data fingerprints to each computing unit by the route, thereby avoiding the communication overhead between the computing units and reducing the energy consumption in the fingerprint searching process.

Drawings

FIG. 1 is a system architecture diagram of the present invention;

FIG. 2 is a schematic diagram of a partition of a 3D stacked memory according to the present invention;

FIG. 3 is a schematic structural diagram of a 3D stacked memory logic layer according to the present invention;

FIG. 4 is a diagram illustrating the structure of PEs in the logic layer according to the present invention;

fig. 5 is a flowchart of a method for identifying a fast duplicate data block based on a 3D stacked memory according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

FIG. 1 is a system architecture diagram of the present invention. As shown in fig. 1, the system architecture includes a CPU10, a crossbar switch 20, and a 3D Stacked Memory (3D Stacked Memory) 30. In this architecture, the CPU10 communicates with the 3D stacked memory 30 through the crossbar 20. In order to provide higher bandwidth with lower latency, hardware architects have proposed a method of "Processing in Memory" (PIM), also known as "Near Data Processing" (NDP), in which one or more light weight processors (PIM core) are located Near the Memory and the PIM core accesses the Memory much faster than the CPU.

The 3D stacked Memory is an emerging storage technology, includes a logic chip and a plurality of Dynamic Random Access Memory (DRAM) chips, and realizes stacking and interconnection by 3D packaging and Through Silicon Vias (TSV) technology, and has the characteristics of high density, high bandwidth, large capacity and low energy consumption. The 3D stacked memory 30 includes a logic layer and a storage layer. The logical layer includes a Router (Router)310 and a plurality of computing units (PEs) 320. The storage layer is divided into a plurality of memory channels (channels) 330.

The route 310 is a bridge for the CPU10 to communicate with the PEs 320, and no communication is required between the two PEs 320, thereby reducing the communication overhead of the logic layer. Each PE 320 is responsible for managing one memory channel 330 and executing fingerprint retrieval logic.

After the CPU10 generates the data block fingerprint, the data block fingerprint is transmitted to the logic layer of the 3D stacked memory 30 through the bus, and after the logic layer executes the retrieval operation, the retrieval result is returned to the CPU 10.

According to the observation, the data block fingerprint has the following rules: the data block fingerprints are treated as hexadecimal character strings, the initial character of each character string is one of '0' -9 'and' A '-F', so that the fingerprints can be classified into sixteen types according to the initial character, and the number of each type of fingerprints is similar. Fig. 2 is a schematic diagram of 3D stacked memory partitioning according to the present invention. Stacking a plurality of DRAM chips to form a storage layer of a 3D stacked memory, vertically dividing the DRAM chips, and dividing the DRAM chips into sectionsEach region of (a) is referred to as a "logical channel". As shown in fig. 2, in order to utilize the parallelism of the 3D stacked memory itself, the entire 3D stacked memory is divided into 16 vertical partitions on average, each partition includes one storage layer channel (channel) and one logic layer PE, so that the storage layer includes 16 channels and the logic layer includes 16 PEs. Each PE corresponding to a channel, each channel storing a type of fingerprint, e.g. PE₀Is responsible for managing channel₀，channel₀A fingerprint character string with first character '0' is stored. So 16 channels can process 16 data block fingerprints in parallel. The present invention uses a hash table to organize the fingerprints stored in each channel. For example, for data fingerprint h: 322341F30AAA93EA52C16B2A302D600C06DB5C3F, first character '3', should be stored in channel3 as shown by the gray vertical partition in FIG. 2.

Fig. 3 is a schematic structural diagram of a 3D stacked memory logic layer according to the present invention. As shown in fig. 3, the logical layer contains one route and 16 PEs. Since fingerprints decide to store channels according to the first character, when a fingerprint arrives, the fingerprint needs to be forwarded to the corresponding PE for processing, and a 'route' is set at a logic layer to control the forwarding action. The route is responsible for communication between the CPU and the PE, so two queues are set up on the route: and the request queue and the result queue are respectively used for forwarding the fingerprint to the PE and returning the retrieval result to the CPU. The request queue in the route stores the data block fingerprints to be matched, and the data block fingerprints are processed according to the policy of FCFS (first come first serve). The result queue of the route stores fingerprint retrieval results. The routing chip is provided with a global buffer area for temporarily storing the data block fingerprints, and a simple computing unit for processing forwarding logic.

FIG. 4 is a diagram illustrating the structure of PEs in the logic layer according to the present invention. As shown in FIG. 4, each PE includes a memory controller, a comparator, an operator, and a buffer.

And the PE inserts the data block fingerprints forwarded by the route into the tail part of a request queue of the PE, and the request queue is stored in a buffer area and used for temporarily storing the data block fingerprints. Because each PE processes a type of fingerprint, when fingerprints with the same initial character continuously appear, waiting is needed, a request queue of data block fingerprints is maintained in each PE, the fingerprints are processed by adopting the strategy of FCFS, and the fingerprints are transmitted to the corresponding PEs one by one for processing by adopting the mode of FCFS. And takes the fingerprint h from the head of the request queue and sends it to the operator and comparator. The PE sends the comparison result to the route.

The arithmetic unit is used for executing a character string hash function, executing hash calculation by taking the fingerprint h as a key, obtaining the storage position of the fingerprint h in the hash table, and sending the storage address to the memory controller. Each arithmetic unit comprises an adder, a shifter, a logic unit and a multiplication unit. In addition, because the memory controller corresponding to each channel is completely customizable, the control logic can be implemented by hardware programming. The main operations of the string hash function are logical exclusive-or operation, addition operation and multiplication operation, and the comparison operation after calculation (whether the corresponding fingerprint in the hash table is the same as the data block fingerprint) can be regarded as logical and operation. Because each PE processes respective fingerprints, 16 PEs can process 16 data block fingerprints in parallel without mutual interference, communication is not needed among the PEs, and communication overhead of a logic layer is saved. Each PE sends a corresponding operation command (read/write) according to the respective comparison result, and the memory controller performs corresponding operation on the storage layer according to the command type.

The memory controller reads out the data stored in the storage address from the DRAM and sends the data to the local buffer and the comparator.

The comparator is used for comparing the fingerprint h taken out from the head of the request queue and the fingerprint of the data taken out from the DRAM and writing the fingerprint h into the storage layer according to the comparison result.

The identification process of the repeated data blocks is as follows: firstly, calculating the fingerprint by using a hash function, determining the storage position of the fingerprint in a hash table, then sending position information to a memory controller, sending a read command by the memory controller, reading out data at the position through TSV connection, comparing the data with the fingerprint, and if the data are the same, indicating that the fingerprint exists, and the data block is a repeated data block which does not need to be stored; if the data blocks are different, the fingerprint does not exist, the data blocks are new data blocks, and the fingerprint needs to be inserted into the hash table and stored into the corresponding storage layer channel. The process is completed in the 3D stacked memory, the participation of a CPU and a bus is not needed, and the movement of data is reduced by utilizing the PIM idea.

Fig. 5 is a flowchart of a method for identifying a fast duplicate data block based on a 3D stacked memory according to an embodiment of the present invention. As shown in fig. 5, the method comprises the steps of:

(1) and sending the data block fingerprints to a 3D stacked memory.

(2) And finishing the retrieval and storage of the data block fingerprints in the 3D stacked memory.

(3) And the 3D stacked memory returns the fingerprint retrieval result to the CPU through the bus.

Step (1) the data block fingerprint is sent to a 3D stacked memory, and the method specifically comprises the following steps:

and (1.1) reading the data block fingerprint from the cache by the CPU.

And (1.2) sending the data block fingerprints to a route of the logic layer through the bus.

The CPU uses a blocking algorithm to block the original data, and uses a secure hash algorithm to calculate the data blocks to obtain the data block fingerprints.

Step (2) retrieving and storing the data block fingerprints in the 3D stacked memory, which specifically comprises the following steps:

and (2.1) forwarding the data block fingerprint to a corresponding PE by the route according to the first character of the data block fingerprint.

And (2.2) the PE inserts the data block fingerprint into the tail of the request queue of the PE, takes out the fingerprint h from the head of the request queue and sends the fingerprint h to the arithmetic unit and the comparator.

And (2.3) the arithmetic unit takes the fingerprint h as a key to execute hash calculation, obtains the storage position of the fingerprint h in the hash table, and sends the storage address to the memory controller.

(2.4) the memory controller reads the data g of the storage address from the storage layer channel, puts the data g into a buffer and simultaneously sends the data g into the comparator.

And (2.5) the comparator compares the fingerprint h with the fingerprint of the data g, and the PE determines the type of the command to be sent to the memory controller according to the comparison result.

And (3) returning the fingerprint retrieval result to the CPU through the bus by the 3D stacked memory, which specifically comprises the following steps:

(3.1) the PE sends the comparison result to the route.

And (3.2) the route returns the fingerprint comparison result to the CPU.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for identifying rapidly repeated data blocks based on a 3D stacked memory is characterized by comprising the following steps:

(1) sending the data block fingerprints to a 3D stacked memory;

(3) the 3D stacked memory returns the fingerprint retrieval result to the CPU;

the 3D stacked memory comprises a logic layer and a memory layer, the logic layer accesses the memory layer through TSV, the logic layer comprises a route and a plurality of computing units, the memory layer is divided into a plurality of memory channels, the route forwards received data fingerprints to each computing unit according to first characters, each computing unit is responsible for managing one memory channel and executing fingerprint retrieval logic, and each memory channel stores data fingerprints of the same first character.

2. The identification method according to claim 1, wherein the step (1) specifically comprises:

(1.1) the CPU reads the data block fingerprint from the cache;

3. The identification method according to claim 1, wherein the step (2) specifically comprises:

(2.1) the route forwards the data block fingerprints to corresponding computing units according to the first characters of the data block fingerprints;

(2.2) the computing unit inserts the data block fingerprint into the tail of a request queue of the computing unit, takes out the fingerprint h from the head of the request queue and sends the fingerprint h to an arithmetic unit and a comparator;

(2.3) the arithmetic unit takes the fingerprint h as a key to execute Hash calculation to obtain a storage address of the fingerprint h in a Hash table, and sends the storage address to the memory controller;

(2.4) the memory controller reads out the fingerprint of the data g of the storage address from the memory channel of the storage layer, puts the fingerprint into a buffer area and simultaneously sends the fingerprint into a comparator;

(2.5) the comparator compares the fingerprint h with the fingerprint of the data g, and the calculating unit determines the type of the command to be sent to the memory controller according to the comparison result;

4. The identification method according to claim 3, characterized in that said step (2.6) comprises in particular:

if the fingerprint is the same as the data block, the fingerprint is determined to exist, and the data block is a repeated data block and does not need to be stored; if the data blocks are different, the fingerprint does not exist, the data blocks are new data blocks, and the fingerprint needs to be inserted into the hash table and stored into the corresponding memory channel of the memory layer.

5. The identification method according to claim 1, wherein the step (3) specifically comprises:

(3.1) the calculation unit sends the fingerprint comparison result to the route;

(3.2) the route returns the fingerprint comparison result to the CPU;

6. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the 3D stacked memory based fast duplicate data block identification method according to any of claims 1 to 5.