WO2016041127A1 - 重复数据删除方法和存储阵列 - Google Patents

重复数据删除方法和存储阵列 Download PDF

Info

Publication number
WO2016041127A1
WO2016041127A1 PCT/CN2014/086530 CN2014086530W WO2016041127A1 WO 2016041127 A1 WO2016041127 A1 WO 2016041127A1 CN 2014086530 W CN2014086530 W CN 2014086530W WO 2016041127 A1 WO2016041127 A1 WO 2016041127A1
Authority
WO
WIPO (PCT)
Prior art keywords
controller
data block
cache
address
feature value
Prior art date
Application number
PCT/CN2014/086530
Other languages
English (en)
French (fr)
Inventor
张巍
吕先红
魏明昌
张陈怡
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201480001884.0A priority Critical patent/CN105612489B/zh
Priority to JP2016547563A priority patent/JP6254293B2/ja
Priority to BR112016003763-4A priority patent/BR112016003763B1/pt
Priority to AU2014403332A priority patent/AU2014403332B2/en
Priority to KR1020167005272A priority patent/KR101716264B1/ko
Priority to PCT/CN2014/086530 priority patent/WO2016041127A1/zh
Priority to CA2920004A priority patent/CA2920004C/en
Priority to EP14898354.7A priority patent/EP3037949B1/en
Publication of WO2016041127A1 publication Critical patent/WO2016041127A1/zh
Priority to US15/449,083 priority patent/US20170177489A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement

Definitions

  • the present invention relates to the field of information technology, and in particular, to a data deduplication method and a storage array.
  • a storage array which typically includes an engine, consists of two controllers in one engine, commonly referred to as a dual controller architecture.
  • the storage array includes an input and output manager A and an input and output manager B, a controller A and a controller B.
  • the input/output manager A is connected to the controller A, and the input/output manager B is connected to the control B.
  • the controller A includes a Peripheral Component Interconnect Express (PCIe) switch A, a Central Processing Unit (CPU) A, and a memory A.
  • the controller B includes a Peripheral Component Interconnect Express (PCIe). Exchange B, Central Processing Unit (CPU) B and memory B.
  • PCIe switch A is connected to PCIe switch B.
  • the CPU A of the controller A divides the data to be written in the memory A into a plurality of data blocks, and calculates the feature value of each data block, by searching the feature value in the feature value index set of the controller A, It is judged whether it is a duplicate data block, if it is a duplicate data block, the data block is deleted; if it is not a duplicate data block, the data block is written to the hard disk.
  • the above-mentioned storage array deduplication process consumes the computing power of the controller's CPU and the memory resources of the controller, which seriously affects the performance of the storage array.
  • Embodiments of the present invention provide a data deduplication method and a storage array.
  • an embodiment of the present invention provides a data deduplication method, where the method is applied to a storage array, where the storage array includes a switching device, a first controller, and a cache.
  • the device is connected to the switching device;
  • the cache device is connected to the switching device;
  • the switching device is connected to a hard disk in the storage array; and the method includes:
  • the first controller acquires the to-be-deduplicated data block by using the switching device.
  • the cache address in the cache device
  • the first controller sends a data read instruction to the controller of the target hard disk through the switching device;
  • the data read command carries the identifier of the cache device and the cache address;
  • the controller of the target hard disk reads the to-be-deduplicated data block from the cache address by using the switching device according to the identifier of the cache device and the cache address;
  • the controller of the target hard disk stores the to-be-deduplicated data block to the target hard disk.
  • the method further includes:
  • the controller of the target hard disk sends a target hard disk storage address to the first controller by using the switching device;
  • the target hard disk storage address includes a controller identifier of the target hard disk and a storage device in the target hard disk Describe the logical storage address of the deduplicated data block;
  • the first controller establishes a feature value index of the to-be-deduplicated data block in the data block feature value index set; the feature value index of the to-be-deduplicated data block includes a feature of the to-be-deduplicated data block The value and the target hard disk storage address.
  • the storing The array further includes a second controller, the second controller being connected to the switching device;
  • the second controller stores the data block address to be deduplicated, the second controller is a home controller of the target logical unit where the data block to be deleted is located; and then the first controller is from the
  • the cache device receives the feature value of the data block to be deleted and deleted, and specifically includes:
  • Determining, by the second controller, that the home controller of the feature value of the data block to be deduplicated is the first controller
  • the second controller sends the feature value of the to-be-deduplicated data block to the first controller by using the switching device.
  • the method further includes: sending, by the switching device, the notification by the first controller to the second controller, where the notification carries the target hard disk storage address;
  • the second controller establishes, according to the notification, a correspondence between the to-be-deleted data block address, the feature value of the to-be-deduplicated data block, and the target hard disk storage address.
  • the method further includes: the second controller establishing the to-be-deduplicated data block address, the to-be-weighted Corresponding relationship between the feature value of the deleted data block and the address of the first controller.
  • an embodiment of the present invention provides a storage array, where the storage array includes a switching device, a first controller, and a cache device; wherein the first controller is connected to the switching device; Connected to the switching device; the switching device is connected to a hard disk in the storage array;
  • the first controller is configured to receive, from the cache device, a feature value of a data block to be deduplicated, and search for a feature value of the data block to be deduplicated in a data block feature value index set;
  • the first controller is further configured to acquire, by using the switching device, the to-be-deduplicated data block. a cache address in the cache device;
  • the first controller is further configured to send, by using the switching device, a data read instruction to a controller of the target hard disk; the data read command carries an identifier of the cache device and the cache address;
  • the controller of the target hard disk is configured to read, by the switching device, the to-be-deduplicated data block from the cache address according to the identifier of the cache device and the cache address;
  • the controller of the target hard disk is further configured to store the to-be-deduplicated data block to the target hard disk.
  • the controller of the target hard disk is further configured to send, by using the switching device, a target hard disk storage address to the first controller;
  • the target hard disk storage The address includes a controller identifier of the target hard disk and a logical storage address of the target hard disk storing the to-be-deduplicated data block;
  • the first controller is further configured to establish, in the data block feature value index set, a feature value index of the to-be-deduplicated data block; the feature value index of the to-be-deduplicated data block includes the to-be-deduplicated data The feature value of the block and the target hard disk storage address.
  • the storage array further includes a second controller, the second controller is connected to the switching device, and the second controller is configured to store the second Resetting the data block address, the second controller is a home controller of the target logical unit where the data block to be deleted is located; and the first controller receives the data block to be deduplicated from the cache device Characteristic values, including:
  • Determining, by the second controller, that the home controller of the feature value of the data block to be deduplicated is the first controller
  • the second controller sends the feature value of the to-be-deduplicated data block to the first controller by using the switching device.
  • the first controller is further configured to send, by using the switching device, a notification to the second controller, where the notification carries the target hard disk storage address;
  • the second controller is further configured to establish, according to the notification, a correspondence between the to-be-deleted data block address, the feature value of the to-be-deduplicated data block, and the target hard disk storage address.
  • the second controller is further configured to establish the to-be-deduplicated data block address, the to-be-deduplicated data block Correspondence between the feature value and the first controller address.
  • the controller and the cache device are connected by the switching device, and the first controller receives the feature value of the data block to be deduplicated from the cache device, and the data block feature value index set Searching for the feature value of the to-be-deduplicated data block, when the same feature value is not queried, the first controller sends the cache address of the data block to be deduplicated in the cache device to the controller of the target hard disk, and the target hard disk
  • the controller reads the to-be-deduplicated data block from the cache address of the data block to be deleted and deleted.
  • the calculation of the fingerprint of the data block to be deleted and deleted by the cache device saves the computing resources of the controller.
  • the controller In the process of storing the data block to be deleted and deleted to the target hard disk, the controller only provides the data block cache address to be deleted and deleted, and the controller of the target hard disk directly reads the data block to be deleted and deleted from the cache address, thereby saving the calculation of the controller. Resource data and memory resources improve the performance of the storage array.
  • FIG. 1 is a structural diagram of a prior art memory array
  • FIG. 2 is a structural diagram of a memory array according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a data write request process according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a data write request process according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a data read request process according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a data block feature value index set
  • FIG. 7 is a flowchart of deduplication processing according to an embodiment of the present invention.
  • the storage array provided by the embodiment of the present invention includes an input and output manager A, a controller A, an input and output manager B, a controller B, a switching device A, a switching device B, and a cache device.
  • the controller A includes the CPU A and the memory A, the CPU A and the memory A communicate through the bus;
  • the controller B includes the CPU B and the memory B, and the CPU B and the memory B communicate via the bus.
  • the input and output manager A is connected to the switching device A and the switching device B, respectively, and the input and output manager B is connected to the switching device A and the switching device B, respectively.
  • Switching device A is interconnected with switching device B. Both switching device A and switching device B are connected to the cache device M.
  • the cache device M will be described in detail below.
  • the controller A is connected to the switching device A and the switching device B, respectively, and the controller B is connected to the switching device A and the switching device B, respectively.
  • a full interconnection architecture of the input and output manager A, the input and output manager B, the controller A, and the controller B is composed.
  • switching device A is connected to all hard disks
  • switching device B is connected.
  • Controller A and controller B communicate with all of the hard disks shown in FIG. 2, respectively.
  • the controller A communication switching device A communicates with all hard disks
  • the controller B communication switching device B communicates with all hard disks.
  • the controller A is used to virtualize the hard disk to form a logical unit LU A, which is provided for use by the host A.
  • the host A mounts the LU A, and the host A performs a data access operation on the LU A through the controller A.
  • the LU A belongs to the controller.
  • Controller A, controller A is the home controller of LU A.
  • the controller B is used to virtualize the hard disk to form a logical unit LU B, which is provided to the host B for use, and the host B mounts the LU B, and the host B performs a data access operation on the LU B through the controller B.
  • LU B belongs to controller B, ie controller B is the home controller of LU B.
  • the host here can be a physical host (or physical server) or a virtual host (or virtual server).
  • the logical unit LU commonly referred to in the industry as a Logical Unit Number (LUN). Assigning to a host LUN actually means assigning the ID of a LU to the host so that the host mounts the LU. Therefore, the LU has the same meaning as the LUN.
  • the switching devices A and B may be PCIe switching devices, or may be non-volatile storage medium (Non-Volatile Memory Express, NVMe) switching devices or serial small computer systems.
  • the interface (Serial Attached SCSI, SAS) switching device and the like are not limited in the embodiment of the present invention.
  • the hard disk connected to the PCIe switching device is a hard disk of the PCIe protocol interface; when the switching devices A and B are NVMe switching devices, the hard disk connected to the NVMe switching device is the NVMe protocol.
  • the hard disk shown in Figure 2 can be a mechanical hard disk or a solid state storage hard disk (Solid) State Disk, SSD), or other media hard drive.
  • the hard disk in the storage array shown in FIG. 2 may be different in storage media of different disks, so as to form a hybrid hard disk storage array, which is not limited in the embodiment of the present invention.
  • the cache device M may be a storage device composed of a volatile storage medium or a non-volatile storage medium, such as a phase change memory (PCM). Other non-volatile storage media suitable for use as a cache device are also not limited in this embodiment of the present invention.
  • the cache device M is used to cache data.
  • the cache device M will be described below in conjunction with a specific embodiment of the present invention.
  • the switching device A is a PCIe switching device
  • the switching device B is a PCIe switching device
  • the hard disk is a PCIe protocol interface SSD.
  • the input/output manager A receives a data write request sent by the host.
  • controller A is the home controller of input and output manager A. Therefore, the input/output manager A receives the data operation request sent by the host, and if the request transmission policy of the input/output manager A is not changed, according to the data operation request, the request is sent to the controller A by default, and the controller A is called Is the home controller of input and output manager A.
  • the input/output manager A receives the data write request sent by the host, and sends a data write request to the controller A through the PCIe switching device A or the PCIe switching device B.
  • the subsequent I/O manager A communicates with the controller A through the PCIe switching device.
  • the input and output manager A can also randomly select the PCIe switching device to communicate with the controller A, which is not limited in this embodiment of the present invention.
  • the embodiment of the present invention takes the input/output manager A to select the communication between the PCIe switching device A and the controller A as an example.
  • the data write request received by the input output manager A carries the data address to be written.
  • the data address to be written includes the identifier of the target LU to be written data, the logical block address (LBA) of the data to be written, and the length of the data to be written.
  • Input Output Manager A sends a data write request to controller A.
  • the controller A receives the data write request, and determines whether the controller A is the home controller of the target LU by the identifier of the target LU of the data to be written in the data address to be written.
  • controller A When controller A is the home controller of the target LU, that is, the target LU is formed by controller A through the virtualized hard disk and provided to the host.
  • the controller A determines a cache device for caching data to be written, which is a cache device M in the embodiment of the present invention.
  • the controller A instructs the cache device M to allocate a cache address for the data to be written according to the data write request, and the cache device M allocates the cache address according to the length of the data to be written.
  • the controller A obtains the cache address allocated by the cache device M for the data to be written (hereinafter referred to as the cache address of the cache device M for the data to be written is the cache address M, an implementation manner, the cache address includes the cache start address and length).
  • the controller A sends the identifier of the cache device M and the cache address M to the input and output manager A through the PCIe switching device A.
  • the input and output manager A receives the identifier of the cache device M and the cache address M sent by the controller A, and writes the data to be written to the cache address M according to the identifier of the cache device M and the cache address M (also referred to as direct direction).
  • the cache address M is written to the data to be written).
  • the controller A only obtains the data allocation buffer address M to be written, and the input/output manager writes the data to be written directly to the cache address through the PCIe switching device A, which saves the calculation of the CPU of the controller A compared with the prior art. Resources and controller A's memory resources improve data writing efficiency.
  • the controller A establishes a correspondence between the data address to be written, the identifier of the cache device M, and the cache address M, so that when the data to be written is read, the controller A sends a buffer to the data to be written to the input and output manager A.
  • Address, input and output manager A can read the data to be written from the cache address of the data to be written (also can be said to read the data to be written directly from the cache address of the data to be written), thereby saving
  • the computing resources of the controller A and the memory resources of the controller A improve the data reading efficiency.
  • the cache device M stores the data to be written into the target SSD of the storage array.
  • the target SSD refers to an SSD that stores data to be written.
  • the specific process of writing the data to be written to the target SSD may be that the controller A sends the identifier of the cache device M and the cache address M to the controller of the target SSD through the PCIe switching device A or the PCIe switching device B.
  • the controller of the target SSD directly reads the to-be-written from the cache address M through the PCIe switching device A or the PCIe switching device B according to the identifier of the cache device M and the cache address M. Incoming data and storing data to be written.
  • the controller of the target SSD sends the target SSD storage address of the data to be written to the controller A through the PCIe switching device A or the PCIe switching device B.
  • the target SSD storage address of the data to be written includes a controller identifier of the target SSD and a logical storage address of the target SSD storing the data to be written.
  • the controller A establishes a correspondence relationship between the data address to be written and the target SSD storage address of the data to be written.
  • Step 301 The host sends a data write request to the input output manager A.
  • the input output manager A is an input/output receiving management device in the storage array, and is responsible for receiving a data operation request from the host and forwarding it to the controller.
  • the host sends a data write request carrying the address of the data to be written to the input and output manager A.
  • the data write request may use a Small Computer System Interface (SCSI) protocol, that is, a SCSI protocol data write request, and of course, other protocols may be used, which are not limited in the embodiment of the present invention.
  • SCSI Small Computer System Interface
  • Step 302 Send a data write request to controller A.
  • the input and output manager A typically communicates with a particular one of the controllers.
  • the input output manager A receives the data write request and sends a data write request to the controller A through the PCIe switching device A or the PCIe switching device B.
  • the data write request is received by the input/output manager A, and the data write request is sent to the controller A by the PCIe switching device A as an example.
  • Step 303 The controller A acquires a cache address of the data to be written.
  • the controller A receives the data write request sent by the input and output device A, and determines the cache device that caches the data to be written. In the embodiment of the present invention, it is the cache device M.
  • the cache device M is assigned to the controller A for a buffer address.
  • the controller A allocates a cache address M for the data to be written in the segment cache address according to the length of the data to be written.
  • the controller A sends an instruction to the cache device M through the PCIe switching device A or the PCIe switching device B, where the command carries the length of the data to be written, and indicates that the cache device M allocates a buffer address for the data to be written, and controls A obtains the cache address M.
  • Step 304 Send the identifier of the cache device M and the cache address M.
  • the controller A obtains the cache address M, and sends the identifier of the cache device M and the cache address M to the input and output manager A through the PCIe switching device A.
  • the identifier of the cache device M is .
  • Step 305 The host sends the data to be written to the input and output manager A.
  • the input/output device A receives the identifier of the cache device M and the cache address M sent by the controller A, and receives the data to be written sent by the host.
  • Step 306 Write data to be written to the cache address M.
  • the input/output manager A writes the data to be written directly to the cache address M through the PCIe switching device A according to the identifier of the cache device M and the cache address M.
  • the input/output manager A receives the write success response of the data to be written sent by the cache device M through the PCIe switching device A.
  • the input output manager A sends a data write request completion response to the host, informing the host that the write request operation is completed.
  • Step 307 Notify the controller A that the data write cache address M is to be written.
  • the input/output manager A successfully writes the data to be written to the cache address M, and notifies the controller A that the data to be written is written to the cache address M.
  • Step 308 The controller A establishes a correspondence between the data address to be written, the cache device M, and the cache address M.
  • the controller A receives the notification sent by the input/output manager A, and establishes a correspondence between the data address to be written, the cache device M, and the cache address M.
  • the cache device M allocates a cache address M to the data to be written, and establishes a correspondence between the data address to be written and the cache address M.
  • the cache device M can obtain the data address to be written from the cache address allocation instruction sent by the controller A, and after the cache device M allocates the cache address M, establish a correspondence between the data address to be written and the cache address M.
  • the cache device M is a dedicated cache device of the target LU, that is, only used to cache data of the target LU, and the cache device M saves the correspondence between the target LU and the LBA and the cache address in the target LU by default.
  • the cache device M recognizes the correspondence between the target LU, the LBA in the target LU, and a certain cache address of the cache device M.
  • the cache device M allocates the cache address M for the data to be written in the segment cache address.
  • the input and output manager A sends the data to be written, and the CPU writes the data to be written into the memory A.
  • CPU A reads the data to be written from the memory A, and sends the data to be written to the PCIe switch B through the PCIe switch A.
  • the PCIe switch B sends the data to be written to the CPU B, and the CPU B writes the data to be written. B.
  • the data to be written in the cache device M is prevented from being lost, and the storage array caches the data to be written on multiple cache devices. Therefore, the cache device N is also included in the storage array shown in FIG.
  • the controller A receives the data write request sent by the input and output manager A, determines that the cache device M caches the data to be written as the primary cache device, and the cache device N caches the data to be written as the backup cache device.
  • the controller A obtains the cache addresses allocated to the data to be written in the cache device M and the cache device N, respectively.
  • the controller A sends an instruction to the cache device M and the cache device N, respectively, for instructing the cache device M and the cache device N to allocate a cache address for the data to be written.
  • the instruction carries the length of the data to be written.
  • the cache device M allocates a cache address for the data to be written, which is called a cache address M, and the cache device N allocates data to be written.
  • the cache address is called the cache address N.
  • Controller A obtains the cache address M and the cache address N.
  • the controller A sends the identifier of the cache device M and the cache address M to the input and output manager A through the PCIe switching device A, and sends the identifier of the cache device N and the cache address N to the input and output manager A through the PCIe switching device A.
  • the controller A can send the identifier of the cache device M and the cache address M, and the identifier of the cache device N and the cache address N to the input/output manager A through a message.
  • the cache device M allocates a dedicated cache address to the controller A, that is, only the data of the LU of the home controller A is cached.
  • the controller A directly allocates a cache address M for the data to be written;
  • the cache device N allocates a dedicated cache address to the controller A, and the controller A caches the segment address of the device N.
  • the cache address N is directly allocated for the data to be written.
  • the input output manager A receives the identifier of the cache device M and the cache address M, and the identifier of the cache device N and the cache address N.
  • the input/output manager A writes the data to be written directly to the cache address M through the PCIe switching device A according to the identifier of the cache device M and the cache address M.
  • the input/output manager A is based on the identifier of the cache device N and the cache address N.
  • the PCIe switching device A directly writes data to be written to the cache address N.
  • the input/output manager A receives the buffer address M and writes the data to be written successfully, and notifies the controller A to establish the correspondence between the data address to be written, the identifier of the cache device M, and the cache address M. Similarly, the controller A establishes a correspondence between the data address to be written, the identifier of the cache device N, and the cache address N.
  • the controller A sends the identifier of the cache device M and the cache address M to the input and output manager A through the PCIe switching device A.
  • the input output manager A receives the identification of the cache device M and the cache address M.
  • the input/output manager A passes the PCIe switching device A or according to the identifier of the cache device M and the cache address M.
  • the PCIe switching device B directly writes the data to be written to the cache address M.
  • the controller A sends a data write command to the cache device M through the PCIe switch device A or the PCIe switch device B, and the data is written to the identifier of the instruction cache device N and the cache address N.
  • the cache device M caches the data to be written, and the cache device M writes the data to be written directly to the cache address N through the PCIe switching device A or the PCIe switching device B according to the data write command.
  • the controller A only obtains the data allocation cache address M and the cache address M to be written, and the input and output manager A can cache the data to be written to the cache device M and the cache device N, thereby saving the calculation of the CPU of the controller A. Resources and controller A's memory resources improve data writing efficiency.
  • the input output manager A receives a data write request from the host.
  • the data write request carries the address of the data to be written.
  • the input output manager A forwards a data write request to the controller through the PCIe switching device A.
  • the controller A receives the data write request sent by the input/output manager A, and determines that the controller A is not the home controller of the target LU according to the identifier of the target LU carried in the data write request. For example, FIG. 4 is shown.
  • Step 401 The host sends a data write request to the input output manager A.
  • the host sends a data write request to the input output manager A, and the data write request carries the data address to be written.
  • Step 402 Send a data write request to controller A.
  • the controller A is the home controller of the input and output manager A.
  • the input/output manager A receives the data write request and sends a data write request to the controller A through the PCIe switching device A or the PCIe switching device B.
  • the data write request is received by the input/output manager A, and the data write request is sent to the controller A through the PCIe switching device A as an example.
  • Step 403 Determine that the control A is not the home controller of the target LU.
  • the controller A receives the data write request sent by the input/output manager A, and determines that the controller A is not the home controller of the target LU according to the identifier of the target LU to be written in the data write request.
  • the controller A queries the correspondence between the controller and the LU, and determines that the controller B is the home controller of the target LU.
  • Step 404 Send a data write request to the controller B.
  • the controller A sends a data write request to the controller B through the PCIe switching device A or the PCIe switching device B.
  • This embodiment takes an example of forwarding a data write request to the controller B through the PCIe switching device B.
  • Step 405 acquires a cache address of the data to be written.
  • the controller B receives the data write request sent by the controller A, and determines the cache device that caches the data to be written. In the embodiment of the present invention, it is the cache device M. For a specific implementation manner, reference may be made to the manner in which the controller A obtains the data cache address to be written from the cache device M.
  • Step 406 Send the identifier of the cache device M and the cache address M to the controller A.
  • the controller B obtains the cache address M, and sends the identifier of the cache device M and the cache address M to the controller A through the PCIe switching device B.
  • the identifier of the device M and the cache address M may be directly buffered to the input/output manager A through the PCIe switching device A or the PCIe switching device B.
  • Step 407 Send the identifier of the cache device M and the cache address M to the input and output manager A.
  • the controller A receives the identifier of the cache device M and the cache address M sent by the controller B, and sets the cache address M of the data to be written through the PCIe exchange.
  • Step 408 The host sends the data to be written to the input and output manager A.
  • the input output manager A receives the identifier of the cache device M and the cache address M, and responds to the data write request sent by the host. The host sends the data to be written to the input and output manager A.
  • Step 409 Write data to be written to the cache address M.
  • the input/output manager A receives the data to be written sent by the host, and writes the data to be written directly to the cache address M through the PCIe switching device A according to the identifier of the cache device M and the cache address M.
  • the input/output manager A receives the write success response of the data to be written sent by the cache device M through the PCIe switching device A.
  • the input output manager A sends a data write request completion response to the host, informing the host that the write request operation is completed.
  • Step 410 Notify the controller B that the data write cache address M is to be written.
  • the input/output manager A successfully writes the data to be written to the cache address M, and notifies the controller A that the data to be written is written to the cache address M. Specifically, the input and output manager A is forwarded by the PCIe switching device A to the controller A, and the controller A forwards the notification to the controller B through the PCIe switching device B. Alternatively, the input/output manager A sends the notification directly to the controller B through the PCIe switching device A or the PCIe switching device B.
  • Step 411 The controller B establishes a correspondence between the data address to be written, the cache device M, and the cache address M.
  • the controller B receives the notification sent by the input/output manager A, and establishes a correspondence between the data address to be written, the cache device M, and the cache address M.
  • the cache device M establishes a correspondence between the data block to be written and the cache address M. For reference, refer to the description of the previous embodiment, and details are not described herein.
  • the cache device N allocates a buffer address N for the data to be written, and establishes a correspondence between the data address to be written and the cache address N.
  • the cache device N can obtain the data address to be written from the cache address allocation instruction sent by the controller A, and after the cache device N allocates the cache address N, establish a correspondence between the data address to be written and the cache address N.
  • the controller A In order to prevent the data to be written cached in the cache device M from being lost, when the data to be written requires multiple cache devices to be cached, the controller A is not the target LU to be written data.
  • the process of input and output manager A sending a data write request to controller B is described with reference to the foregoing embodiment.
  • the controller B can refer to the controller A as the home controller of the target LU to be written data, and the controller A acquires the scenario of the cache addresses of the multiple cache devices.
  • Other steps may also be described with reference to the foregoing embodiments, and details are not described herein again.
  • the host After the host writes the data to the storage array, the host accesses the write data, that is, the data read request, and the specific process is as shown in FIG. 5:
  • Step 501 Send a data read request.
  • the host sends a data read request to the input/output manager A, and the data read request carries the data address to be read.
  • the data address to be read includes the identifier of the logical unit LU where the data to be read is located, the LBA of the data to be read, and the length of the data to be read.
  • the host can send the data read request to the input and output manager A through the SCSI protocol, which is not limited by the present invention.
  • the data to be read here is the data to be written described above.
  • Step 502 Send a data read request to controller A.
  • the input output manager A receives the data read request sent by the host, and sends a data read request to the controller A through the PCIe switching device A.
  • Step 503 The controller A sends the identifier of the cache device M and the cache address M to the input/output management device A.
  • the controller A When the controller A is the home controller of the LU where the data data is to be located, and when the data to be read is cached in the cache device, such as the cache device M, the data address to be read is queried according to the data read request, Corresponding relationship between the identifier of the cache device and the cache address, and determining the cache address M of the cached data to be read in the cache device M.
  • the cache address of the data to be read in the cache device M is the cache address M.
  • the controller A sends the identifier of the cache device M and the cache address M to the input and output manager A through the PCIe switching device A.
  • Step 504 Read data to be read from the cache address M.
  • the input/output manager A reads the data to be read directly from the cache address M through the PCIe switching device A according to the identifier of the cache device M and the cache address M.
  • Step 505 Return to the data to be read.
  • the input output manager A reads the data to be read from the cache address M, and returns the read data to be read to the host.
  • the controller A queries for The correspondence between the LU where the data is located and the home controller is read, and the controller B is determined to be the home controller of the LU where the data to be read is located.
  • the controller A sends a data query request to be read to the controller B through the PCIe switching device B.
  • the data to be read as the data to be written is taken as an example.
  • the address of the data to be read is the data address to be written as described above.
  • the cache address of the data to be read in the cache device M is the cache address M.
  • the controller B queries the correspondence between the data address to be written, the identifier of the cache device M, and the cache address M, determines the identifier of the cache device M that caches the data to be read, and the cache address M, and passes the PCIe switching device B to the controller.
  • A sends the identifier of the cache device M and the cache address M, and the controller A sends the identifier of the cache device M and the cache address M to the input and output manager A through the PCIe switching device A.
  • the controller B can also send the identifier of the cache device M and the cache address M to the input and output manager A directly through the PCIe switching device A or the PCIe switching device B.
  • the subsequent read operation refer to the read operation of the previous embodiment, and details are not described herein again.
  • the data to be read here is still taken as an example.
  • the address of the data to be read is the data address to be written as described above.
  • the home controller of the LU where the data to be read is located is queried for reading.
  • Corresponding relationship between the data address (to be written to the data address) and the target SSD storage address of the data to be read obtaining the target SSD storage address of the data to be read, and passing the PCIe switching device A or the PCIe switching device B to the input and output manager A sends the target SSD storage address of the data to be read.
  • the target SSD storage address of the data to be read includes the controller identifier of the target SSD and the logical address of the data to be read in the target SSD.
  • the input/output manager A reads the data to be read directly from the logical address in the target SSD from the data to be read through the PCIe switching device A or the PCIe switching device B according to the target SSD storage address of the data to be read.
  • the cache address in the cache device according to the data to be read.
  • the input/output manager A reads data directly from the cache address through the PCIe switching device A or the PCIe switching device B; according to the controller identifier of the target SSD and the logical address of the data to be read in the target SSD, the input and output manager A reads data directly from the logical address in the target SSD through PCIe switching device A or PCIe switching device B. I will not repeat them here.
  • the home controller of the LU where the data to be read is normally returned to the input and output manager A to return the identifier and cache address M of the main cache device M that caches the data to be read.
  • the home controller of the LU where the data to be read is normally returned to the input and output manager A to return the identifier and cache address M of the main cache device M that caches the data to be read.
  • the host sends a data write request to the input and output manager A, the data write request carrying the data address to be written.
  • the input output manager A sends a data write request to the controller A through the PCIe switching device A.
  • the controller A is the home controller of the target LU to which the data is to be written, the controller A supplies the input and output manager A with the identifier of the cache device M and the cache address M.
  • the input/output manager A writes the data to be written directly to the cache address M through the PCIe switching device A or the PCIe switching device B according to the identifier of the cache device M and the cache address M.
  • the data to be written by the cache device M is deduplicated before being stored in the SSD of the storage array, which can effectively save storage space and improve storage space utilization.
  • the data stored in the storage array SSD is deduplicated before being stored in the SSD by the cache device M.
  • Deduplication technology divides data into data blocks according to predetermined rules and calculates the feature values of each data block. Calculating the data block feature value usually uses a hash algorithm to perform Hash calculation on the data block to obtain a hash value.
  • commonly used hash algorithms include MD5, SHA1, SHA-256, and SHA-512.
  • the logical storage address of the data block B is stored in the SSD as the logical storage address of the data block A in the SSD.
  • the comparison of the data block feature values is implemented by a controller. Because of the deduplication in the storage array, each unique block has a feature value, so a large number of feature values are produced.
  • each controller is responsible for partial data block eigenvalue comparison. In this way, each controller maintains only the feature value index of the partial unique data stored in the storage array according to the data block feature value distribution algorithm, and the feature value index of the partial unique data is called the feature value index set.
  • the controller queries the feature value index set for the feature value of the data block to be written into the SSD, and determines whether it is the same as a certain feature value in the feature value index set.
  • the controller A needs to maintain the feature value index set A according to the feature value distribution algorithm, and the controller A is called the home controller of each feature value in the feature value index set A. Or from the characteristic value
  • the controller in which the feature value in the set A is the same as the feature value of the data block X is the home controller of the feature value of the data block X, and is also the home controller of each feature value from the feature value index set A.
  • the feature value index set is composed of each feature value index, as shown in FIG. 6.
  • the data block stores the address 1 and the reference count.
  • the data block storage address 1 is used to indicate the storage address of a certain unique data block C in SSD A or the storage address of the data block C in the cache device.
  • the storage address of data block C in SSD A may contain the identity of the SSD A controller and the logical storage address of the storage data block C in SSD A.
  • the storage address of the data block C in the cache device includes the identifier of the cache device and the cache address.
  • the feature value 1 represents the feature value of the data block C.
  • the reference count indicates the number of data blocks having the feature value of 1.
  • the data block address in the feature value index is the storage address of the data block in the cache device or the target hard disk storage address of the data block.
  • the storage address of the data block in the cache device includes an identifier of the cache device and a cache address of the data block in the cache device;
  • the target hard disk storage address of the data block includes an identifier of the target hard disk controller and a logical storage address of the storage data block in the target hard disk.
  • the eigenvalue index shown in FIG. 6 is only an exemplary implementation, and may also be a multi-level index, and an index form that can be used in deduplication may be used in the embodiment of the present invention.
  • the controller A is taken as the home controller of the target LU of the data block buffered in the cache device M.
  • the controller A obtains the identifier of the cache device M and the cache address M.
  • Input and output manager A is based on the identity of the cache device M and
  • the cache address M is written to the cache address M by the PCIe switch device A or the PCIe switch device B.
  • the controller A establishes a correspondence between the data address to be written, the identifier of the cache device M, and the cache address M.
  • the data in the cache address M is taken as an example.
  • the feature values of the data block need to be calculated.
  • the data needs to be divided into data blocks according to certain rules. There are two methods for dividing a data block: dividing the data into fixed-length data blocks, or dividing the data into variable-length data blocks.
  • the embodiment of the present invention takes a data block divided into fixed lengths as an example, such as dividing data into data blocks of 4 KB size.
  • the data to be written written in the cache address M is divided into a plurality of data blocks of 4 KB size.
  • Controller A records the identity of the LU of each data block, the LBA of the data block, and the length of the data block.
  • the identifier of the LU of the data block of the data block, the LBA of the data block, and the length of the data block are the data block addresses.
  • the controller A passes through the PCIe switching device A or the PCIe switching device.
  • B. Send a data block feature value request to the cache device M, and the feature value request includes a data block X address.
  • the cache device M sends the feature value of the data block X to the controller A through the PCIe switching device A or the PCIe switching device B to perform deduplication.
  • the specific process is as shown in FIG. 7 and includes:
  • Step 701 The cache device M calculates the feature value of the data block X.
  • the controller A sends a feature value instruction for acquiring the data block X to the cache device M, and the instruction carries the data block X address.
  • the cache device M receives the controller A to send the feature value of the acquired data block X.
  • the cache device M stores the correspondence between the data block X and the cache address B, determines the data block X according to the data block X address carried by the feature value instruction of the data block X, and calculates the feature value of the data block X.
  • the feature value of the data block X is obtained, and the feature value of the data block X is buffered at the cache address X.
  • Step 702 Send the feature value of the data block X to the controller A.
  • the cache device M obtains the feature value of the data block X, and sends the feature value response message of the data block X to the home controller A of the LU where the data block X is located.
  • the feature value response message of the data block X carries the feature value of the data block X.
  • the feature value response message of the data block X also carries the identifier of the cache device M that caches the feature value of the data block X and the cache address X of the feature value of the data block X in the cache device M.
  • Step 703 Determine a home controller of the feature value of the data block X according to the feature value distribution algorithm.
  • Step 704 The controller A queries the local feature value index set A.
  • the controller A When the controller A is the home controller of the feature value of the data block X, the controller A queries the local feature value index set A, and determines whether there is a feature value identical to the feature value of the data block X in the feature value index set A.
  • steps 705a and 706a are performed.
  • the feature value of the data block X is the same as the feature value 1, that is, the data block X is the same as the data block A.
  • Step 705a Controller A updates the reference count in the feature value 1 index.
  • the reference count in the eigenvalue 1 index is 1, that is, only the data block A in the storage array, and the eigenvalue of the found data block X is the same as the eigenvalue 1, and the reference count is updated to 2.
  • Step 706a The controller A notifies the cache device M to delete the data block X.
  • the controller A notifies the cache device M to delete the data block X.
  • the controller A establishes a correspondence relationship between the data block X address and the feature value of the data block X.
  • the controller A establishes a correspondence relationship between the data block X address, the feature value of the data block X, and the storage address of the data block A.
  • step 704 It is determined in step 704 that the data block X is a duplicate data block, and therefore, it is no longer necessary to save the data block X into the SSD, thus notifying the cache device M to delete the data block X.
  • steps 705b, 706b, 707, 708, 709, and 710 are performed.
  • Step 705b Acquire a cache address B of the cache device M cache data block X.
  • the controller A acquires the cache address B of the data block X from the cache device M through the PCIe switching device A according to the cache address X in the cache device M according to the characteristic value of the data X.
  • Step 706b Send the identifier of the cache device M and the cache address B to the controller of the target SSD.
  • the controller A obtains the identifier of the cache device M and the cache address B, and sends the identifier of the cache device M and the cache address B to the controller of the target SSD through the PCIe switching device A or the PCIe switching device B.
  • Step 707 The controller of the target SSD reads the data block X from the cache address B.
  • the controller of the target SSD receives the identifier of the cache device M and the cache address B, and reads the data block X directly from the cache address B through the PCIe switching device A or the PCIe switching device B according to the identifier of the cache device M and the cache address B. .
  • Step 708 The controller of the target SSD sends the target SSD storage address of the data block X to the controller A.
  • the controller of the target SSD reads the data block X from the cache address B and stores the data block X in the target SSD.
  • the controller of the target SSD transmits the target SSD storage address of the data block X to the controller A through the PCIe switching device A.
  • the target SSD storage address of the data block X includes the controller identifier of the target SSD and the logical storage address of the storage data block X in the target SSD.
  • Step 709 The controller A establishes an index of the feature values of the data block X.
  • the controller A receives the target SSD storage address of the data block X, establishes an index of the feature value of the data block X, and sets the reference count to 1.
  • the controller A establishes the correspondence between the address of the data block X, the feature value of the data block X, and the target SSD storage address of the data block X.
  • Controller A also records the cache address X of the feature value of data block X. when When the feature value of the data block X is stored in the SSD, the controller A also records the target SSD storage address of the feature value of the data block X.
  • controller A when controller A is not the home controller of the feature value of data block X, it is only the home controller of the LU where data block X is located.
  • the controller B is the home controller of the feature value of the data block X.
  • the controller A sends the feature value of the data block X to the controller B through the PCIe switching device A or the PCIe switching device B.
  • the controller B receives the feature value of the data block X sent by the controller A, and queries the feature value index set B of the controller B.
  • the controller B queries the feature value index set B to have the same feature value as the feature value of the data block X
  • the feature value of the data block R is the same as the feature value of the data block X.
  • the controller B notifies the cache device M to delete the data block X, and specifically includes the controller B, and sends a delete command to the controller A through the PCIe switching device B.
  • the controller A sends the delete command to the cache device M through the PCIe switching device A, and the cache device M deletes the data block X.
  • the controller B updates the reference count of the index having the same feature value as the data block X, that is, the reference count is incremented by one.
  • the data block R in the data block R index stores the address, and includes the controller identifier of the SSD storing the data R and the logical storage address of the storage data block R in the SSD.
  • the data block R in the data block R index stores the address, including the identifier of the cache device and the cache address.
  • the controller A establishes a correspondence relationship between the data block X address, the feature value of the data block X, and the address of the home controller B of the feature value of the data block, so that the controller A does not need each data block address, the feature value of the data block, and
  • the correspondence between the data block storage addresses effectively reduces the amount of data stored by the controller A.
  • the controller A establishes the correspondence between the data block X address, the feature value of the data block X, and the data block R storage address.
  • the controller A can directly query the data block X address and the data block X.
  • the controller B queries that the feature value index set B does not exist and the feature value of the data block X With the same feature value, the controller B sends a request to the controller A to obtain the cache address B of the data block X in the cache device M by the switching device B to the PCIe. Controller A sends the request to the cache device M through the PCIe switching device A. The cache device M sends the identifier of the cache device M and the cache address B to the controller B.
  • the controller B sends the identifier of the cache device M and the cache address B to the controller of the target SSD through the PCIe switching device A or the PCIe switching device B (here, the PCIe switching device A is taken as an example).
  • the controller of the target SSD reads the data block X directly from the cache address B through the PCIe switching device A or the PCIe switching device B according to the identifier of the cache device M and the cache address B, and stores the data block X in the target SSD.
  • the controller of the target SSD transmits the target SSD storage address of the data block X to the controller B through the PCIe switching device A or the PCIe switching device B.
  • the controller B receives the target SSD storage address of the data block X, establishes the feature value index of the data block X, and sets the reference count in the index to 1.
  • the controller B also records the cache address X of the feature value of the data block X.
  • the controller B also records the storage address of the feature value of the data block X in the SSD.
  • the controller B receives the target SSD storage address of the data block X and sends a notification to the controller A.
  • the notification carries the target SSD storage address of the data block X.
  • the controller A establishes the correspondence between the address of the data block X, the feature value, and the target SSD storage address of the data block X according to the notification sent by the controller B.
  • the controller A establishes the data block X address and the data block. The correspondence between the feature value of X and the address of controller B.
  • the calculation of the data block X fingerprint is implemented by the cache device, which saves the computing resources of the controller.
  • the controller In the process of storing the data block X to the target SSD, the controller only provides the identifier of the cache device M and the cache address B, and the controller of the target SSD directly reads the data block X from the cache address B, thereby saving the computing resources of the controller. And memory resources to improve the performance of the storage array.
  • data is written to the SSD according to the above-described deduplication operation.
  • the input/output manager A receives the data read request, the data block X is read as an example.
  • the data read request carries the data block X address.
  • the input output manager A sends the data read request to the controller A through the PCIe switching device A.
  • Controller A determines that controller A is the home controller of the LU where data block X is located. In one implementation, the controller A searches for the correspondence between the data block X address to be read, the feature value of the data block X, and the target SSD storage address of the data block X, and determines the target SSD storage address of the data block X.
  • the controller A transmits the target SSD storage address of the to-be-data block X to the input/output manager A through the PCIe switching device A.
  • the input/output device A reads the data block X directly from the target SSD logical address of the data block X through the PCIe switching device A or the PCIe switching device B according to the target SSD storage address of the data block data block X to be read.
  • the controller A searches for the data block X address, the correspondence between the feature value of the data block X and the home controller address of the feature value of the data block X, and determines the home controller B of the feature value of the data block X.
  • the feature value index of the data block determines the storage address of the data block having the same feature value as the data block X, and then reads the data from the storage address of the data block in which the data block X has the same feature value.
  • controller A When controller A is both the home controller of the LU where the data block X is located, and the home control of the feature value of the data block X
  • the controller A searches for the correspondence between the X-address of the data block to be read and the feature value of the data block X, and queries the index of the feature value maintained by the controller A according to the characteristic value of the data block X.
  • the set A is determined to determine the storage address of the data block X to be read, and then the storage address of the data block X to be read is sent to the input/output manager A.
  • the input/output manager A reads the data block from the storage address of the data block X through the PCIe switching device A or the PCIe switching device B.
  • the input-output manager and the controller may have no concept of attribution, that is, the controller A is not the home controller of the input-output manager A.
  • Each input and output manager saves the correspondence between the LU and the controller to which the LU belongs.
  • the input/output manager queries the correspondence between the target LU identifier and the home controller according to the identifier of the target LU carried in the data operation request, determines the home controller of the target LU, and directly passes the PCIe switching device A or the PCIe switching device B to the target.
  • the LU's home controller sends the request.
  • the logical storage address of the data block X stored in the target hard disk in the storage address of the target hard disk refers to the logical block address of the data block X stored in the target hard disk.
  • the logical block address of the data block X is stored in the target SSD.
  • any input/output manager is connected to any controller through any switching device, or any input/output manager is connected to any hard disk through any switching device, or any input/output manager through any switching device
  • a cache device is connected.
  • Either controller is connected to any controller through any switching device, or any controller is connected to any hard disk through any switching device, or any controller is connected to any cache device through any switching device.
  • Any cache device is connected to any hard disk through a switching device. Two-way communication can be achieved by a connection between any two devices through either switching device.
  • the controllers are collectively referred to as controller planes
  • the switching devices are collectively referred to as switching planes
  • the hard disks are collectively referred to as storage planes
  • the input and output managers are collectively referred to as input/output management planes, and cache devices.
  • data read and write control is separated from data read and write. Data read and write control is implemented by the controller, and data read/write (or read/write data) does not flow through the controller, which saves the computing resources of the controller CPU and the memory resources of the controller, improves data writing efficiency, and improves the efficiency. Data processing efficiency of the storage array.
  • the storage array architecture of the embodiment of the present invention can implement expansion of devices such as controllers and hard disks, and can flexibly add controllers, switching devices, hard disks, and the like according to performance requirements of the storage array.
  • the technical solution in the embodiment of the present invention can also be applied to a scenario in which the storage array includes an input and output manager, a controller, a switching device, a cache device, and a plurality of hard disks.
  • the manner in which data is written to the storage array can be described with reference to the foregoing embodiments.
  • the scenario in which data deduplication is performed in the storage array is described with reference to the previous embodiment.
  • the data reading operation in the memory array can be referred to the description of the previous embodiment.
  • the storage array may also include two controllers, one switching device, and two controllers respectively connected to the switching device, and in this scenario, data writing, deduplication, and data reading operations are also performed. Description of the embodiments, which are not described herein again.
  • the device A reads data from the cache address A through the PCIe switching device A or the PCIe switching device B according to the identifier of the device B and the cache address A (or directly reads from the cache address A).
  • Data or write data to the cache address A (or directly write data to the cache address A)
  • this implementation can be realized by Direct Memory Access (DMA) technology.
  • DMA Direct Memory Access
  • the device A and the device B represent devices that specifically perform DMA access in the embodiment of the present invention.
  • the controller obtains the cache address of the device B, and sends the identifier of the device B and the cache address of the device B to the device C through the PCIe switching device A or the PCIe switching device B. Because the controller passes the PCIe switching device A or the PCIe switching device B.
  • the device B communicates with the device B to obtain the cache address, and the device B has been identified. Therefore, the cache address is obtained, and the device B identifier and the device B cache address can be sent to the device C.
  • the controller can also obtain the identification and cache address of device B.
  • the identifier of device B may be the address of device B or other identifier that uniquely identifies the device.
  • the disclosed systems and methods can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the unit is only one type of logic work. Can be divided, the actual implementation can have another way of division, for example, multiple units or components can be combined or can be integrated into another system, or some features can be ignored, or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable non-volatile storage medium.
  • the medium includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing non-volatile storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

重复数据删除方法和存储阵列,控制器与缓存设备通过交换设备连接,缓存设备计算待重删数据块的特征值,控制器根据待重删数据块的特征值查询数据块特征值索引集合,当没有查询到相同的特征值时,控制器将待重删数据块在缓存设备中的缓存地址发送给目标硬盘的控制器,目标硬盘的控制器,从数据块的缓存地址中读取该待重删数据块。

Description

重复数据删除方法和存储阵列 技术领域
本发明涉及信息技术领域,尤其涉及一种重复数据删除方法和存储阵列。
背景技术
存储阵列,一般包括一个引擎,一个引擎中包括两个控制器,通常称为双控制器结构。如图1所示,存储阵列包括输入输出管理器A和输入输出管理器B,控制器A和控制器B。输入输出管理器A与控制器A连接,输入输出管理器B与控制B连接。控制器A包括外围组件快速互联(Peripheral Component Interconnect express,PCIe)交换A、中央处理单元(Central Processing Unit,CPU)A和内存A;控制器B包括外围组件快速互联(Peripheral Component Interconnect express,PCIe)交换B、中央处理单元(Central Processing Unit,CPU)B和内存B。PCIe交换A与PCIe交换B连接。在图1所示的存储阵列中,在将待写入数据写入到硬盘之前,进行重复数据删除。具体过程:控制器A的CPU A将内存A中待写入数据分块得到多个数据块,并且计算每一个数据块的特征值,通过查找控制器A的特征值索引集合中的特征值,判断是否为重复数据块,如果是重复数据块,则将该数据块删除;如果不是重复数据块,则将该数据块写入硬盘。
上述存储阵列重复数据删除过程,消耗了控制器的CPU的计算能力和控制器的内存资源,严重影响了存储阵列的性能。
发明内容
本发明实施例提供了一种重复数据删除方法和存储阵列。
第一方面,本发明实施例提供了一种重复数据删除方法,所述方法应用于存储阵列,所述存储阵列包括交换设备、第一控制器和缓存 设备;其中所述第一控制器与所述交换设备连接;所述缓存设备与所述交换设备连接;所述交换设备与所述存储阵列中的硬盘连接;所述方法包括:
所述第一控制器从所述缓存设备接收待重删数据块的特征值,在数据块特征值索引集合查找所述待重删数据块的特征值;;
当在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述第一控制器,通过所述交换设备,获取所述待重删数据块在所述缓存设备中的缓存地址;
所述第一控制器,通过所述交换设备,向目标硬盘的控制器发送数据读取指令;所述数据读取指令携带所述缓存设备的标识和所述缓存地址;
所述目标硬盘的控制器根据所述缓存设备的标识和所述缓存地址,通过所述交换设备,从所述缓存地址中读取所述待重删数据块;
所述目标硬盘的控制器将所述待重删数据块存储到所述目标硬盘。
结合本发明第一方面,第一种可能的实施方式中,所述方法还包括:
所述目标硬盘的控制器,通过所述交换设备,向所述第一控制器发送目标硬盘存储地址;所述目标硬盘存储地址包括所述目标硬盘的控制器标识和所述目标硬盘中存储所述待重删数据块的逻辑存储地址;
所述第一控制器在所述数据块特征值索引集合中建立所述待重删数据块的特征值索引;所述待重删数据块的特征值索引包括所述待重删数据块的特征值和所述目标硬盘存储地址。
结合本发明第一方面,第二种可能的实施方式中,所述存储 阵列还包括第二控制器,所述第二控制器与所述交换设备连接;
所述第二控制器存储所述待重删数据块地址,所述第二控制器为所述待重删数据块所在的目标逻辑单元的归属控制器;则所述第一控制器从所述缓存设备接收待重删数据块的特征值,具体包括:
所述缓存设备通过所述交换设备,向所述第二控制器发送所述待重删数据块的特征值;
所述第二控制器确定所述待重删数据块的特征值的归属控制器为所述第一控制器;
所述第二控制器,通过所述交换设备,向所述第一控制器发送所述待重删数据块的特征值。
结合本发明第一方面的第二种可能的实施方式,第三种可能的实施方式中,在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述方法还包括:所述第一控制器,通过所述交换设备,向所述第二控制器发送通知,所述通知中携带所述目标硬盘存储地址;
所述第二控制器根据所述通知,建立所述待重删数据块地址、所述待重删数据块的特征值和所述目标硬盘存储地址的对应关系。
结合本发明第一方面的第二种可能的实施方式,第四种可能的实施方式中,所述方法还包括:所述第二控制器建立所述待重删数据块地址、所述待重删数据块的特征值与所述第一控制器地址的对应关系。
第二方面,本发明实施例提供了一种存储阵列,所述存储阵列包括交换设备、第一控制器、缓存设备;其中,所述第一控制器与所述交换设备连接;所述缓存设备与所述交换设备连接;所述交换设备与所述存储阵列中的硬盘连接;
所述第一控制器用于从所述缓存设备接收待重删数据块的特征值,在数据块特征值索引集合查找所述待重删数据块的特征值;
当在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述第一控制器还用于通过所述交换设备,获取所述待重删数据块在所述缓存设备中的缓存地址;
所述第一控制器还用于通过所述交换设备,向目标硬盘的控制器发送数据读取指令;所述数据读取指令携带所述缓存设备的标识和所述缓存地址;
所述目标硬盘的控制器用于根据所述缓存设备的标识和所述缓存地址,通过所述交换设备,从所述缓存地址中读取所述待重删数据块;
所述目标硬盘的控制器还用于将所述待重删数据块存储到所述目标硬盘。
结合本发明第二方面,第一种可能的实施方式中,所述目标硬盘的控制器还用于通过所述交换设备,向所述第一控制器发送目标硬盘存储地址;所述目标硬盘存储地址包括所述目标硬盘的控制器标识和所述目标硬盘中存储所述待重删数据块的逻辑存储地址;
所述第一控制器还用于在所述数据块特征值索引集合中建立所述待重删数据块的特征值索引;所述待重删数据块的特征值索引包括所述待重删数据块的特征值和所述目标硬盘存储地址。
结合本发明第二方面,第二种可能的实施方式中,所述存储阵列还包括第二控制器,所述第二控制器与所述交换设备连接;所述第二控制器用于存储所述待重删数据块地址,所述第二控制器为所述待重删数据块所在的目标逻辑单元的归属控制器;则所述第一控制器从所述缓存设备接收待重删数据块的特征值,具体包括:
所述缓存设备通过所述交换设备,向所述第二控制器发送所述待重删数据块的特征值;
所述第二控制器确定所述待重删数据块的特征值的归属控制器为所述第一控制器;
所述第二控制器,通过所述交换设备,向所述第一控制器发送所述待重删数据块的特征值。
结合本发明第一方面的第二种可能的实施方式,第三种可能的实施方式中,在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述第一控制器还用于通过所述交换设备,向所述第二控制器发送通知,所述通知中携带所述目标硬盘存储地址;
所述第二控制器还用于根据所述通知,建立所述待重删数据块地址、所述待重删数据块的特征值和所述目标硬盘存储地址的对应关系。
结合本发明第一方面的第二种可能的实施方式,第四种可能的实施方式中,所述第二控制器还用于建立所述待重删数据块地址、所述待重删数据块的特征值与所述第一控制器地址的对应关系。
本发明实施例提供的重复数据删除方法和存储阵列,控制器与缓存设备通过交换设备连接,第一控制器从所述缓存设备接收待重删数据块的特征值,在数据块特征值索引集合查找所述待重删数据块的特征值,当没有查询到相同的特征值时,第一控制器将待重删数据块在缓存设备中的缓存地址发送给目标硬盘的控制器,目标硬盘的控制器,从待重删数据块的缓存地址中读取该待重删数据块。由缓存设备实现待重删数据块指纹的计算,节省了控制器的计算资源。在将待重删数据块存储到目标硬盘过程,控制器只提供待重删数据块缓存地址,由目标硬盘的控制器直接从缓存地址中读取待重删数据块,节省了控制器的计算资源数据和内存资源,提高了存储阵列的性能。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,下面描述中的附图仅仅是本发 明的一些实施例,还可以根据这些附图获得其他的附图。
图1为现有技术存储阵列结构图;
图2为本发明实施例存储阵列结构图;
图3为本发明实施例数据写请求处理流程图;
图4为本发明实施例数据写请求处理流程图;
图5为本发明实施例数据读请求处理流程图;
图6为数据块特征值索引集合示意图;
图7为本发明实施例重复数据删除处理流程图。
具体实施例
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明提供的实施例所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例提供的存储阵列,如图2所示的存储阵列中,包括输入输出管理器A、控制器A、输入输出管理器B、控制器B、交换设备A、交换设备B和缓存设备M。其中,控制器A包括CPU A和内存A,CPU A和内存A通过总线进行通信;控制器B包括CPU B和内存B,CPU B和内存B通过总线进行通信。输入输出管理器A分别与交换设备A和交换设备B连接,输入输出管理器B分别与交换设备A和交换设备B连接。交换设备A与交换设备B互连连接。交换设备A和交换设备B均与缓存设备M连接。关于缓存设备M下面将详细描述。控制器A分别与交换设备A和交换设备B连接,控制器B分别与交换设备A和交换设备B连接。基于上述描述,围绕交换设备A和交换设备B,组成了输入输出管理器A、输入输出管理器B、控制器A和控制器B的全互连架构。图2所示存储阵列中,交换设备A与所有硬盘连接,交换设备B 也与所有硬盘连接。控制器A与控制器B分别与图2中所示的所有硬盘通信。具体地,控制器A通信交换设备A与所有硬盘通信,控制器B通信交换设备B与所有硬盘通信。控制器A用于将硬盘虚拟化,形成逻辑单元LU A,提供给主机A使用,主机A挂载LU A,主机A通过控制器A对该LU A进行数据访问操作,这里称该LU A归属控制器A,即控制器A是LU A的归属控制器。同理,控制器B用于将硬盘虚拟化,形成逻辑单元LU B,提供给主机B使用,主机B挂载LU B,主机B通过控制器B对该LU B进行数据访问操作,这里称该LU B归属控制器B,即控制器B是LU B的归属控制器。这里的主机可以为物理主机(或称物理服务器),也可以为虚拟主机(或称虚拟服务器)。逻辑单元LU,业界通常称为逻辑单元号(Logical Unit Number,LUN)。分配给主机LUN,实际是指将某一LU的标识分配给主机,以使主机挂载该LU,因此,这里LU与LUN具有相同含义。图2所示的存储阵列中,交换设备A和B可以为PCIe交换设备,也可以为非易失性存储介质的高速传输总线(Non-Volatile Memory express,NVMe)交换设备或者串行小型计算机系统接口(Serial attached SCSI,SAS)交换设备等,本发明实施例不作限定。当交换设备A和B为PCIe交换设备时,则与PCIe交换设备连接的硬盘为PCIe协议接口的硬盘;当交换设备A和B为NVMe交换设备时,则与NVMe交换设备连接的硬盘为NVMe协议接口的硬盘;当交换设备A和B为SAS交换设备时,则与SAS交换设备连接的硬盘为SAS协议接口的硬盘;图2所示的硬盘可以是机械硬盘,也可以是固态存储硬盘(Solid State Disk,SSD),或者其他介质的硬盘。图2所示的存储阵列中的硬盘,不同盘的存储介质可以不同,从而组成混合硬盘存储阵列,本发明实施例不作限定。
缓存设备M具体可以为由易失性存储介质或非易失性存储介质组成的存储设备,如相变存储器(Phase Change Memory,PCM)等, 其他适合用来作为缓存设备的非易失性存储介质也可,本发明实施例对此不作限定。缓存设备M用于缓存数据。下面将结合本发明具体实施例,对缓存设备M进行描述。本发明实施例中以交换设备A为PCIe交换设备,交换设备B为PCIe交换设备,硬盘为PCIe协议接口SSD为例。
图2所示的存储阵列,输入输出管理器A接收主机发送的数据写请求。一种实施方式,控制器A是输入输出管理器A的归属控制器。因此,输入输出管理器A接收到主机发送的数据操作请求,在没有改变输入输出管理器A的请求发送策略的情况下,根据数据操作请求,默认向控制器A发送请求,则称控制器A是输入输出管理器A的归属控制器。本发明实施例中,输入输出管理器A接收主机发送的数据写请求,通过PCIe交换设备A或PCIe交换设备B,向控制器A发送数据写请求。具体通过哪一个PCIe交换设备转发该请求,可以根据预设规则,一旦选定PCIe交换设备,则后续输入输出管理器A均通过该PCIe交换设备与控制器A通信。当然输入输出管理器A也可随机选择PCIe交换设备与控制器A进行通信,本发明实施例对此不作限定。本发明实施例以输入输出管理器A选定PCIe交换设备A与控制器A进行通信为例。
输入输出管理器A接收到的数据写请求携带待写入数据地址。其中待写入数据地址包括待写入数据的目标LU的标识、待写入数据的逻辑块地址(Logical Block Address,LBA)及待写入数据的长度。输入输出管理器A向控制器A发送数据写请求。控制器A接收到数据写请求,通过待写入数据地址中的待写入数据的目标LU的标识判断控制器A是否为目标LU的归属控制器。
当控制器A是目标LU的归属控制器,也就是目标LU是由控制器A通过虚拟化硬盘形成的,并向主机提供的。控制器A确定用于缓存待写入数据的缓存设备,本发明实施例中为缓存设备M。 一种实现方式为,控制器A根据数据写请求,指示缓存设备M为待写入数据分配缓存地址,缓存设备M根据待写数据长度分配缓存地址。控制器A获取缓存设备M为待写入数据分配的缓存地址(以下称缓存设备M为待写入数据分配的缓存地址为缓存地址M,一种实现方式,缓存地址包括缓存的起始地址和长度)。控制器A通过PCIe交换设备A,向输入输出管理器A发送缓存设备M的标识和缓存地址M。输入输出管理器A接收到控制器A发送的缓存设备M的标识和缓存地址M,根据缓存设备M的标识和缓存地址M,向缓存地址M写入待写入数据(也可称为直接向缓存地址M写入待写入数据)。控制器A只获取待写入数据分配缓存地址M,输入输出管理器通过PCIe交换设备A,直接向缓存地址写入待写入数据,相对于现有技术,节省了控制器A的CPU的计算资源和控制器A的内存资源,提高数据写入效率。
控制器A建立待写入数据地址、缓存设备M的标识以及缓存地址M的对应关系,以供读取该待写入数据时,控制器A向输入输出管理器A发送待写入数据的缓存地址,输入输出管理器A可从待写入数据的缓存地址中读取该待写入数据(也可称为直接从待写入数据的缓存地址中读取该待写入数据),从而节省控制器A的CPU的计算资源和控制器A的内存资源,提高数据读取效率。
待满足条件后,如果存储阵列不进行重复数据删除,则缓存设备M将待写入数据存储到存储阵列的目标SSD中。目标SSD是指存储待写入数据的SSD。待写入数据写入到目标SSD的具体过程,可以是控制器A,通过PCIe交换设备A或PCIe交换设备B,向目标SSD的控制器发送缓存设备M的标识和缓存地址M。目标SSD的控制器根据缓存设备M的标识和缓存地址M,通过PCIe交换设备A或PCIe交换设备B,直接从缓存地址M中读取待写 入数据,并且存储待写入数据。目标SSD的控制器,通过PCIe交换设备A或PCIe交换设备B,向控制器A发送待写入数据的目标SSD存储地址。其中,待写入数据的目标SSD存储地址包括目标SSD的控制器标识和目标SSD中存储待写入数据的逻辑存储地址。控制器A建立待写入数据地址和待写入数据的目标SSD存储地址的对应关系。
上述过程,具体如图3所示:
步骤301:主机向输入输出管理器A发送数据写请求。
输入输出管理器A为存储阵列中输入输出接收管理设备,负责接收主机发送数据操作请求,并转发到控制器。本发明实施例中,主机向输入输出管理器A发送携带待写入数据地址的数据写请求。示例性地,数据写请求可以使用小型计算机系统接口(Small Computer System Interface,SCSI)协议,即SCSI协议数据写请求,当然还可以使用其他协议,本发明实施例对此不作限定。
步骤302:向控制器A发送数据写请求。
本发明实施例中,输入输出管理器A通常与特定的一个控制器进行通信。关于输入输出管理器A如何建立与某一个控制器的对应关系,可以有多种方式,比如根据控制器的负载,或者根据特定的路径选择算法,本发明对此不作限定。输入输出管理器A接收到数据写请求,通过PCIe交换设备A或者PCIe交换设备B向控制器A发送数据写请求。本发明实施例中,以输入输出管理器A接收到数据写请求,通过PCIe交换设备A向控制器A发送数据写请求为例。
步骤303:控制器A获取待写入数据的缓存地址。
控制器A接收输入输出设备A发送的数据写请求,确定缓存待写入数据的缓存设备,本发明实施例中,为缓存设备M。一种 实现方式,缓存设备M分配给控制器A一段缓存地址。控制器A在该段缓存地址中,根据待写入数据的长度,为待写入数据分配缓存地址M。另一种实现方式,控制器A通过PCIe交换设备A或者PCIe交换设备B向缓存设备M发送指令,指令中携带待写入数据的长度,指示缓存设备M为待写入数据分配缓存地址,控制器A获得缓存地址M。
步骤304:发送缓存设备M的标识和缓存地址M。
控制器A获得缓存地址M,通过PCIe交换设备A向输入输出管理器A发送缓存设备M的标识和缓存地址M。其中,缓存设备M的标识为。
步骤305:主机向输入输出管理器A发送待写入数据。
输入输出设备A接收控制器A发送的缓存设备M的标识和缓存地址M,接收主机发送的待写入数据。
步骤306:向缓存地址M写入待写入数据。
输入输出管理器A根据缓存设备M的标识和缓存地址M,通过PCIe交换设备A,直接向缓存地址M中写入待写入数据。输入输出管理器A,通过PCIe交换设备A,接收缓存设备M发送的待写入数据写成功响应。输入输出管理器A向主机发送数据写请求完成响应,通知主机写请求操作完成。
步骤307:通知控制器A待写入数据写入缓存地址M。
输入输出管理器A将待写入数据写入缓存地址M成功,通知控制器A待写入数据写入到缓存地址M。
步骤308:控制器A建立待写入数据地址、缓存设备M和缓存地址M的对应关系。
控制器A接收输入输出管理器A发送的通知,建立待写入数据地址、缓存设备M和缓存地址M的对应关系。
缓存设备M为待写入数据分配缓存地址M,则建立待写入数据地址与缓存地址M的对应关系。缓存设备M可以从控制器A发送的缓存地址分配指令中获得待写入数据地址,缓存设备M分配缓存地址M后,建立待写入数据地址与缓存地址M的对应关系。另一种实现方式,缓存设备M为目标LU的专属缓存设备,即只用来缓存目标LU的数据,则缓存设备M默认保存目标LU、目标LU中的LBA和缓存地址的对应关系。缓存设备M认保存目标LU、目标LU中的LBA与缓存设备M的某一段缓存地址的对应关系,缓存设备M在该段缓存地址中为待写入数据分配缓存地址M。
为提高存储阵列的可靠性,为将待写入数据缓存多份,在图1所示的现有技术中,输入输出管理器A发送待写入数据,CPU将待写入数据写入内存A,CPU A从内存A中读取待写入数据,通过PCIe交换A将待写入数据发送到PCIe交换B,PCIe交换B将待写入数据发送到CPU B,CPU B将待写入数据内存B。在本发明实施例中,防止缓存设备M中待写入数据丢失,存储阵列会将待写入数据缓存在多个缓存设备上。因此,以在两个缓存设备上分别缓存待写入数据为例,图2所示的存储阵列中还包括缓存设备N。PCIe交换设备A和PCIe交换设备B均与缓存设备N连接。因此,控制器A接收到输入输出管理器A发送的数据写请求,确定缓存设备M作为主缓存设备缓存待写入数据,缓存设备N作为备份缓存设备缓存待写入数据。控制器A分别获取缓存设备M和缓存设备N中为待写入数据分配的缓存地址。一种实现方式,控制器A分别向缓存设备M和缓存设备N发送指令,该指令分别用于指示缓存设备M和缓存设备N为该待写入数据分配缓存地址。其中,该指令中携带待写入数据的长度。缓存设备M为待写入数据分配缓存地址称为缓存地址M,缓存设备N为待写入数据分配 缓存地址称为缓存地址N。控制器A获取缓存地址M和缓存地址N。控制器A通过PCIe交换设备A向输入输出管理器A发送缓存设备M的标识和存缓存地址M,通过PCIe交换设备A向输入输出管理器A发送缓存设备N的标识和缓存地址N。具体实现中,控制器A可以通过一条消息将缓存设备M的标识和缓存地址M,以及缓存设备N的标识和缓存地址N,发送给输入输出管理器A。也可以通过两条消息分别发送,这里不作限定。另一种实现,缓存设备M为控制器A分配一段专属缓存地址,即只用来缓存归属控制器A的LU的数据。控制器A在缓存设备M的该段缓存地址中,直接为待写入数据分配缓存地址M;缓存设备N为控制器A分配一段专属缓存地址,控制器A在缓存设备N的该段缓存地址中,直接为待写入数据分配缓存地址N。
输入输出管理器A接收到缓存设备M的标识和缓存地址M,以及缓存设备N的标识和缓存地址N。输入输出管理器A根据缓存设备M的标识和缓存地址M,通过PCIe交换设备A,直接向缓存地址M写入待写入数据;输入输出管理器A根据缓存设备N的标识和缓存地址N,PCIe交换设备A,直接向缓存地址N写入待写入数据。输入输出管理器A,通过PCIe交换设备A,接收缓存地址M写入待写入数据成功响应,通知控制器A建立待写入数据地址、缓存设备M的标识以及缓存地址M的对应关系。同理,控制器A建立待写入数据地址、缓存设备N的标识以及缓存地址N的对应关系。
另一种实现方式,控制器A通过PCIe交换设备A,向输入输出管理器A发送缓存设备M的标识和缓存地址M。输入输出管理器A接收到缓存设备M的标识和缓存地址M。输入输出管理器A根据缓存设备M的标识和缓存地址M,通过PCIe交换设备A或 者PCIe交换设备B,直接向缓存地址M写入待写入数据。控制器A通过PCIe交换设备A或者PCIe交换设备B,向缓存设备M发送数据写入指令,数据写入指令缓存设备N的标识和缓存地址N。缓存设备M缓存待写入数据,缓存设备M根据数据写入指令,通过PCIe交换设备A或者PCIe交换设备B,直接向缓存地址N写入待写入数据。
控制器A只获取待写入数据分配缓存地址M和缓存地址M,即可由输入输出管理器A实现将待写入数据缓存到缓存设备M和缓存设备N,节省了控制器A的CPU的计算资源和控制器A的内存资源,提高数据写入效率。
另一种情况,输入输出管理器A接收主机的数据写请求。数据写请求携带待写入数据地址。输入输出管理器A通过PCIe交换设备A转发向控制器发送数据写请求。控制器A接收到输入输出管理器A发送的数据写请求,根据数据写请求中携带的目标LU的标识,判断控制器A不是目标LU的归属控制器,具体实施例如图4所示。
步骤401:主机向输入输出管理器A发送数据写请求。
主机向输入输出管理器A发送数据写请求,数据写请求携带待写入数据地址。
步骤402:向控制器A发送数据写请求。
本发明实施例中,控制器A是输入输出管理器A的归属控制器。输入输出管理器A接收到数据写请求,通过PCIe交换设备A或者PCIe交换设备B,向控制器A发送数据写请求。本发明实施例中,以输入输出管理器A接收到数据写请求,通过PCIe交换设备A,向控制器A发送数据写请求为例。
步骤403:判断控制A不是目标LU的归属控制器。
控制器A接收到输入输出管理器A发送的数据写请求,根据数据写请求中携带的待写入数据的目标LU的标识,判断控制器A不是目标LU的归属控制器。控制器A查询控制器与LU的对应关系,确定控制器B为目标LU的归属控制器。
步骤404:向控制器B发送数据写请求。
控制器A通过PCIe交换设备A或者PCIe交换设备B,向控制器B发送数据写请求。本实施例以通过PCIe交换设备B,向控制器B转发数据写请求为例。
步骤405获取待写入数据的缓存地址。
控制器B接收控制器A发送的数据写请求,确定缓存待写入数据的缓存设备,本发明实施例中,为缓存设备M。具体实现方式可以参考前述控制器A从缓存设备M中获取待写入数据缓存地址的方式。
步骤406:向控制器A发送缓存设备M的标识和缓存地址M。
控制器B获取缓存地址M,通过PCIe交换设备B向控制器A发送缓存设备M的标识和缓存地址M。另一种实现方式,也可以通过PCIe交换设备A或者PCIe交换设备B,直接向输入输出管理器A缓存设备M的标识和缓存地址M。
步骤407:向输入输出管理器A发送缓存设备M的标识和缓存地址M。
控制器A收到控制器B发送的缓存设备M的标识和缓存地址M,通过PCIe交换设待写入数据的缓存地址M。
步骤408:主机向输入输出管理器A发送待写入数据。
输入输出管理器A收到缓存设备M的标识和缓存地址M,响应主机发送的数据写请求。主机向输入输出管理器A发送待写入数据。
步骤409:向缓存地址M写入待写入数据。
输入输出管理器A接收主机发送的待写入数据,根据缓存设备M的标识和缓存地址M,通过PCIe交换设备A,直接向缓存地址M中写入待写入数据。输入输出管理器A,通过PCIe交换设备A,接收缓存设备M发送的待写入数据写成功响应。输入输出管理器A向主机发送数据写请求完成响应,通知主机写请求操作完成。
步骤410:通知控制器B待写入数据写入缓存地址M。
输入输出管理器A将待写入数据写入缓存地址M成功,通知控制器A待写入数据写入到缓存地址M。具体包括输入输出管理器A,通过PCIe交换设备A转发该通知至控制器A,控制器A将该通知通过PCIe交换设备B转发给控制器B。或者,输入输出管理器A,通过PCIe交换设备A或者PCIe交换设备B,直接向控制器B发送该通知。
步骤411:控制器B建立待写入数据地址、缓存设备M和缓存地址M的对应关系。
控制器B接收输入输出管理器A发送的通知,建立待写入数据地址、缓存设备M和缓存地址M的对应关系。
缓存设备M建立待写入数据地块与缓存地址M的对应关系,可以参考前面实施例描述,在此不再赘述。
缓存设备N为待写入数据分配缓存地址N,则建立待写入数据地址与缓存地址N的对应关系。缓存设备N可以从控制器A发送的缓存地址分配指令中获得待写入数据地址,缓存设备N分配缓存地址N后,建立待写入数据地址与缓存地址N的对应关系。
为防止缓存设备M中缓存的待写入数据丢失,待写入数据需要多个缓存设备作缓存时,在控制器A不是待写入数据的目标LU 的归属控制器场景下,输入输出管理器A发送数据写请求至控制器B的过程,参考前述实施例描述。控制器B为获取待写入数缓存地址的过程,可参考控制器A为待写入数据的目标LU的归属控制器,控制器A获取多个缓存设备的缓存地址的场景。其他步骤也可参考前述实施例描述,在此不再赘述。
主机将数据写入存储阵列后,主机访问写入数据,即数据读取请求,具体流程,如图5所示:
步骤501:发送数据读请求。
主机向输入输出管理器A发送数据读请求,数据读请求中携带待读取数据地址。待读取数据地址包括待读取数据所在的逻辑单元LU的标识、待读取数据的LBA及待读取数据的长度。具体的,主机可以通过SCSI协议向输入输出管理器A发送该数据读请求,本发明对此不作限定。为描述方便,这里待读取数据为前面描述的待写入数据。
步骤502:向控制器A发送数据读请求。
输入输出管理器A接收主机发送的数据读请求,通过PCIe交换设备A向控制器A发送数据读请求。
步骤503:控制器A向输入输出管理设备A发送缓存设备M的标识和缓存地址M。
当控制器A是待数据数据的所在的LU的归属控制器时,并且当待读取的数据在缓存在缓存设备,如缓存设备M中时,根据数据读请求,查询待读取数据地址、缓存设备的标识与缓存地址的对应关系,确定缓存待读取数据在缓存设备M中的缓存地址M。当待读取数据仍然缓存在缓存设备M时,待读取数据在缓存设备M中的缓存地址为缓存地址M。控制器A通过PCIe交换设备A,向输入输出管理器A发送缓存设备M的标识和缓存地址M。
步骤504:从缓存地址M读取待读取数据。
输入输出管理器A根据缓存设备M的标识和缓存地址M,通过PCIe交换设备A,直接从缓存地址M中读取待读取数据。
步骤505:返回待读取数据。
输入输出管理器A从缓存地址M中读取待读取数据,向主机返回该读取待读取数据。
当输入输出管理器A根据数据读请求,通过PCIe交换设备A向控制器A发送待读取数据查询请求,控制器A不是待读取数据所在的LU的归属控制器时,控制器A查询待读取数据所在的LU与归属控制器的对应关系,确定控制器B为待读取数据所在的LU的归属控制器。控制器A通过PCIe交换设备B,向给控制器B发送待读取数据查询请求。这里仍以前述待写入数据为这里的待读取数据为例。则待读取数据的地址为前面描述的待写入数据地址,当待读取数据仍然缓存在缓存设备M时,待读取数据在缓存设备M中的缓存地址为缓存地址M。控制器B查询待写入数据地址、缓存设备M的标识和缓存地址M的对应关系,确定缓存该待读取数据的缓存设备M的标识和缓存地址M,通过PCIe交换设备B,向控制器A发送缓存设备M的标识和缓存地址M,控制器A通过PCIe交换设备A向输入输出管理器A发送缓存设备M的标识和缓存地址M。控制器B也可以直接通过PCIe交换设备A或PCIe交换设备B,向输入输出管理器A发送缓存设备M的标识和缓存地址M。后续读取操作可参考前面实施例的读取操作,在此不再赘述。
仍以前述待写入数据为这里的待读取数据为例。则待读取数据的地址为前面描述的待写入数据地址,当待读取数据已经存储在目标SSD时,待读取数据所在的LU的归属控制器查询待读取 数据地址(待写入数据地址)和待读取数据的目标SSD存储地址的对应关系,获得待读取数据的目标SSD存储地址,通过PCIe交换设备A或PCIe交换设备B,向输入输出管理器A发送待读取数据的目标SSD存储地址。待读取数据的目标SSD存储地址包括目标SSD的控制器标识和待读取数据在目标SSD中的逻辑地址。输入输出管理器A根据待读取数据的目标SSD存储地址,通过PCIe交换设备A或PCIe交换设备B,直接从待读取数据在目标SSD中的逻辑地址中读取待读取数据。
上述实施例中,当待读取的数据部分保存在目标SSD中,部分缓存在本发明实施例中的缓存设备M中时,则根据上述描述,根据待读取数据在缓存设备中的缓存地址,输入输出管理器A通过PCIe交换设备A或PCIe交换设备B,直接从缓存地址中读取数据;根据目标SSD的控制器标识和待读取数据在目标SSD中的逻辑地址,输入输出管理器A通过PCIe交换设备A或PCIe交换设备B,直接从目标SSD中的逻辑地址中读取数据。在此不再赘述。
在多个缓存设备缓存待读取数据的操作,通常待读取数据所在的LU的归属控制器向输入输出管理器A返回缓存该待读取数据的主缓存设备M的标识和缓存地址M。其他流程操作可参考前面实施例的读取操作,在此不再赘述。
在存储阵列中,删除重复数据,可以节省存储空间,降低存储成本。本发明实施例如图2所示的存储阵列中,主机向输入输出管理器A发送数据写请求,数据写请求携带待写入数据地址。输入输出管理器A通过PCIe交换设备A,向控制器A发送数据写请求。当控制器A是待写入数据的目标LU的归属控制器,则控制器A向输入输出管理器A提供缓存设备M的标识和缓存地址M。 输入输出管理器A根据该缓存设备M的标识和存缓存地址M,通过PCIe交换设备A或PCIe交换设备B,直接向缓存地址M写入待写入数据。
缓存设备M缓存的待写入数据在存储到存储阵列的SSD之前,进行重复数据删除,可以有效节省存储空间,提高存储空间利用率。以图2所示的存储阵列为例,存储阵列SSD中存储的数据,在由缓存设备M存储到SSD之前,进行重复数据删除。重复数据删除技术,是将数据按照预定的规则分成数据块,计算每一个数据块的特征值。计算数据块特征值通常使用哈希(Hash)算法,对数据块进行Hash计算得到Hash值,作为特征值,常用的Hash算法包括MD5、SHA1、SHA-256、SHA-512等。例如,数据块A的特征值与SSD中已经存储的数据块B的特征值相同,则表明数据块A和数据块B是相同的数据块,则将重复的数据块A从缓存设备M中删除,同时将在SSD中存储数据块B的逻辑存储地址作为数据块A在SSD中的逻辑存储地址。
具体实现中,上述数据块特征值比较是由控制器来实现的。因为在存储阵列中进行重复数据删除,每个唯一的数据块都会有一个特征值,因此会生产大量的特征值。为实现存储阵列中控制器均衡,可以根据数据块特征值分布算法,如Hash分布算法,每个控制器负责部分数据块特征值比较。这样每个控制器根据数据块特征值分布算法,只维护存储阵列存储的部分唯一数据的特征值索引,部分唯一数据的特征值索引称为特征值索引集合。控制器从特征值索引集合查询将要写入SSD的数据块的特征值,判断是否与特征值索引集合中的某一特征值相同。例如,控制器A,根据特征值分布算法,需要维护特征值索引集合A,则称控制器A是特征值索引集合A中的每个特征值的归属控制器。或者从特征值索 引集合A中的特征值与数据块X的特征值相同的控制器为数据块X的特征值的归属控制器,同时也是从特征值索引集合A中每一个特征值的归属控制器。
具体地,特征值索引集合是由每个特征值索引构成的,如图6所示。以特征值1索引为例,包括特征值1,数据块存储地址1和引用计数。其中数据块存储地址1用于表示某一个唯一数据块C在SSD A中的存储地址或者数据块C在缓存设备中的存储地址。数据块C在SSD A中的存储地址可以包含该SSD A控制器的标识以及SSD A中的存储数据块C的逻辑存储地址。数据块C在缓存设备中的存储地址包括缓存设备的标识和缓存地址。特征值1表示数据块C的特征值。引用计数表示具有特征值1的数据块的数量,例如,存储阵列中第一次存储数据块A时,具有特征值1的数据块的数量为1,则引用计数为1。当往SSD中再次存储具有相同特征值1的数据块D时,根据重复数据删除原理,SSD中不再保存数据块D,但此时引用计数要加1,更新为2。概括起来,特征值索引中的数据块地址为数据块在缓存设备中的存储地址或数据块的目标硬盘存储地址。数据块在缓存设备中的存储地址包括缓存设备的标识和缓存设备中数据块的缓存地址;数据块的目标硬盘存储地址包括目标硬盘控制器的标识和目标硬盘中存储数据块的逻辑存储地址。图6所示的特征值索引仅仅是示例性实现,也可能是多级索引,在重复数据删除时可以使用的索引形式均可,本发明实施例对此不作限定。
在图2所示的存储阵列中,以控制器A作为缓存设备M中缓存的数据块的目标LU的归属控制器为例。结合前面实施例,输入输出管理器A接收数据写请求后,从控制器A获得缓存设备M的标识和缓存地址M。输入输出管理器A根据缓存设备M的标识和 缓存地址M,通过PCIe交换设备A或者PCIe交换设备B,直接向缓存地址M写入待写入数据,控制器A建立待写入数据地址、缓存设备M的标识和缓存地址M的对应关系。归属控制器A的LU中的缓存数据从缓存设备M写入到SSD时,以缓存地址M中的数据为例。通常在进行重复数据删除时,需要计算数据块的特征值。为计算数据块的特征值,首先需要按照一定规则对数据进行划分得到数据块。数据块的划分方法包括两种:将数据划分为固定长度的数据块,或者将数据划分为可变长度的数据块。本发明实施例以划分为固定长度的数据块为例,如将数据划分为4KB大小的数据块。示例性的,将缓存地址M中写入的待写数据划分为若干个4KB大小的数据块。控制器A记录每一个数据块的LU的标识、数据块的LBA和数据块的长度。以下称数据块的数据块的LU的标识、数据块的LBA和数据块的长度为数据块地址。以若干个4KB大小的数据块中的数据块X为例(这里称数据块X为待重复数据删除数据块,简称为待重删数据块),控制器A通过PCIe交换设备A或者PCIe交换设备B,向缓存设备M发送数据块特征值请求,特征值请求包括数据块X地址。缓存设备M通过PCIe交换设备A或者PCIe交换设备B向控制器A发送数据块X的特征值,以进行重复数据删除,具体流程如图7所示,包括:
步骤701:缓存设备M计算数据块X的特征值。
控制器A向缓存设备M发送获取数据块X的特征值指令,指令中携带数据块X地址。缓存设备M接收控制器A发送获取数据块X的特征值请指令。一种情况,缓存设备M保存有数据块X地块与缓存地址B的对应关系,根据数据块X的特征值指令携带的数据块X地址,确定数据块X,计算数据块X的特征值,得到数据块X的特征值,在缓存地址X缓存数据块X的特征值。
步骤702:向控制器A发送数据块X的特征值。
缓存设备M得到数据块X的特征值,向数据块X所在的LU的归属控制器A发送数据块X的特征值响应消息,数据块X的特征值响应消息中携带据块X的特征值。同时数据块X的特征值响应消息还携带缓存数据块X的特征值的缓存设备M的标识及数据块X的特征值在缓存设备M中的缓存地址X。
步骤703:根据特征值分布算法确定数据块X的特征值的归属控制器。
步骤704:控制器A查询本地特征值索引集合A。
当控制器A为数据块X的特征值的归属控制器时,控制器A查询本地特征值索引集合A,判断特征值索引集合A中是否存在与数据块X的特征值相同的特征值。
当特征值索引集合A存在与数据块X的特征值相同的特征值,则执行步骤705a和706a。如图6所示,数据块X的特征值与特征值1相同,也就是数据块X与数据块A相同。
步骤705a:控制器A更新特征值1索引中的引用计数。
特征值1索引中的引用计数为1,即存储阵列中只有数据块A,查找到数据块X的特征值与特征值1相同,则将引用计数更新为2。
步骤706a:控制器A通知缓存设备M删除数据块X。
控制器A通知缓存设备M删除数据块X。控制器A建立数据块X地址与数据块X的特征值的对应关系。或者控制器A建立数据块X地址、数据块X的特征值和数据块A的存储地址的对应关系。
在步骤704中确定数据块X为重复数据块,因此,不再需要将数据块X保存到SSD中,因此通知缓存设备M删除数据块X。
当特征值索引集合A不存在与数据块X的特征值相同的特征 值,则执行步骤705b、706b、707、708、709和710。
步骤705b:获取缓存设备M缓存数据块X的缓存地址B。
控制器A,根据数据X的特征值在缓存设备M中的缓存地址X,通过PCIe交换设备A,向缓存设备M获取数据块X的缓存地址B。
步骤706b:向目标SSD的控制器发送缓存设备M的标识和缓存地址B。
控制器A获得缓存设备M的标识和缓存地址B,通过PCIe交换设备A或者PCIe交换设备B,向目标SSD的控制器发送缓存设备M的标识和缓存地址B。
步骤707:目标SSD的控制器从缓存地址B读取数据块X。
目标SSD的控制器接收缓存设备M的标识和缓存地址B,根据缓存设备M的标识和存缓存地址B,通过PCIe交换设备A或者PCIe交换设备B,直接从缓存地址B中读取数据块X。
步骤708:目标SSD的控制器向控制器A发送数据块X的目标SSD存储地址。
目标SSD的控制器从缓存地址B中读取数据块X,将数据块X存储到目标SSD中。目标SSD的控制器通过PCIe交换设备A,向控制器A发送数据块X的目标SSD存储地址。其中,数据块X的目标SSD存储地址包括目标SSD的控制器标识和目标SSD中存储数据块X的逻辑存储地址。
步骤709:控制器A建立数据块X的特征值的索引。
控制器A接收数据块X的目标SSD存储地址,建立数据块X的特征值的索引,并将引用计数置为1。控制器A建立数据块X的地址、数据块X的特征值和数据块X的目标SSD存储地址的对应关系。控制器A还要记录数据块X的特征值的缓存地址X。当 数据块X的特征值存储到SSD中时,控制器A还要记录数据块X的特征值的目标SSD存储地址。
另一种情况,当控制器A不是数据块X的特征值的归属控制器,只是数据块X所在的LU的归属控制器。本发明实施例中以控制器B为数据块X的特征值的归属控制器为例,控制器A将数据块X的特征值通过PCIe交换设备A或PCIe交换设备B发送给控制器B。控制器B接收控制器A发送的数据块X的特征值,查询控制器B的特征值索引集合B。当控制器B查询到特征值索引集合B存在与数据块X的特征值相同的特征值,如数据块R的特征值与数据块X的特征值相同。控制器B通知缓存设备M删除数据块X,具体包括控制器B,通过PCIe交换设备B,向控制器A发送删除指令。控制器A将该删除指令,通过PCIe交换设备A发送给缓存设备M,缓存设备M删除数据块X。控制器B更新与数据块X具有相同特征值的索引的引用计数,即引用计数加1。当数据块R已经存储在SSD中,则数据块R索引中的数据块R存储地址,则包括存储数据R的SSD的控制器标识和SSD中存储数据块R的逻辑存储地址。当数据块R在缓存设备中,则数据块R索引中的数据块R存储地址,包括缓存设备的标识和缓存地址。控制器A建立数据块X地址、数据块X的特征值和数据块的特征值的归属控制器B的地址的对应关系,从而控制器A不需要每一个数据块地址、数据块的特征值和数据块存储地址的对应关系,有效减少了控制器A存储的数据量。或者控制器A建立数据块X地址、数据块X的特征值和数据块R存储地址的对应关系,在后续读取数据块X时,控制器A可以直接通过查询数据块X地址、数据块X的特征值和数据块R存储地址的对应关系,确定数据块R存储地址,输入输出管理器A,通过PCIe交换设备A或PCIe交换设 备B直接从数据块R存储地址读取数据块X,从而提高了数据读取效率。
当控制器A只是数据块X所在的LU的归属控制器,但不是数据块X的特征值的归属控制器情况下,控制器B查询到特征值索引集合B不存在与数据块X的特征值相同的特征值,控制器B通过向PCIe交换设备B,向控制器A发送请求获得数据块X在缓存设备M中的缓存地址B。控制器A将该请求通过PCIe交换设备A发送到缓存设备M。缓存设备M将缓存设备M的标识和缓存地址B发送给控制器B。控制器B,通过PCIe交换设备A或PCIe交换设备B(此处以PCIe交换设备A为例),向目标SSD的控制器发送缓存设备M的标识和缓存地址B。目标SSD的控制器根据缓存设备M的标识和缓存地址B,通过PCIe交换设备A或PCIe交换设备B,直接从缓存地址B读取数据块X,并将该数据块X存储到目标SSD中。目标SSD的控制器,通过PCIe交换设备A或PCIe交换设备B,向控制器B发送数据块X的目标SSD存储地址。控制器B接收数据块X的目标SSD存储地址,建立数据块X的特征值索引,并将索引中的引用计数置为1。控制器B还要记录数据块X的特征值的缓存地址X。当数据块X的特征值存储到SSD中时,控制器B还要记录数据块X的特征值在SSD中的存储地址。
控制器B接收数据块X的目标SSD存储地址,向控制器A发送通知。通知中携带数据块X的目标SSD存储地址。控制器A根据控制器B发送的通知,建立数据块X的地址、特征值和数据块X的目标SSD存储地址的对应关系。另一种实现方式,当控制器A只是数据块X所在的LU的归属控制器,但不是数据块X的特征值的归属控制器情况下,控制器A建立数据块X地址、数据块 X的特征值和控制器B的地址的对应关系。
本发明实施例中的存储阵列,由缓存设备实现数据块X指纹的计算,节省了控制器的计算资源。在将数据块X存储到目标SSD过程,控制器只提供缓存设备M的标识和缓存地址B,由目标SSD的控制器直接从缓存地址B中读取数据块X,节省了控制器的计算资源和内存资源,提高了存储阵列的性能。
基于图2所示的存储阵列,根据上述重复数据删除操作,数据写入到SSD。当输入输出管理器A接收到数据读请求,以读取数据块X为例。数据读请求携带数据块X地址。输入输出管理器A,通过PCIe交换设备A向控制器A发送该数据读请求。控制器A判断控制器A是数据块X所在的LU的归属控制器。一种实现方式,控制器A查找待读取数据块X地址、数据块X的特征值和数据块X的目标SSD存储地址的对应关系,确定数据块X的目标SSD存储地址。控制器A通过PCIe交换设备A,向输入输出管理器A发送待数据块X的目标SSD存储地址。输入输出设备A根据待读取的数据块数据块X的目标SSD存储地址,通过PCIe交换设备A或PCIe交换设备B,直接从数据块X的目标SSD逻辑地址中读取数据块X。另一种实现方式,控制器A查找数据块X地址,数据块X的特征值和数据块X的特征值的归属控制器地址的对应关系,确定数据块X的特征值的归属控制器B,查询控制器B中数据块X的特征值索引,确认数据块X的目标SSD存储地址,或者确定数据块X的特征值的归属控制器B,查询控制器B中与数据块X具有相同特征值的数据块的特征值索引,确定与数据块X具有相同特征值的数据块的存储地址,然后从数据块X具有相同特征值的数据块的存储地址中读取数据。当控制器A既是数据块X所在的LU的归属控制器,又是数据块X的特征值的归属控 制器情况下,另一种实现方式,控制器A查找待读取数据块X地址和数据块X的特征值的对应关系,根据数据块X的特征值,查询控制器A维护的特征值索引集合A,从而确定待读取的数据块X的存储地址,然后将待读取的数据块X的存储地址发送给输入输出管理器A。输入输出管理器A通过PCIe交换设备A或PCIe交换设备B,从数据块X的存储地址中读取数据块。
当图2所示的存储阵列中,写入数据缓存到多个缓存设备时,在进行重复数据删除操作时,只对其中一个缓存设备中的数据进行重复数据删除。具体可以对主缓存设备中的数据进行重复数据删除,也可以根据缓存数据的多个缓存设备的负载,选择一个进行重复数据删除操作。本发明实施例对此不作限定。
在本发明实施例中,另一种实施情况,输入输出管理器与控制器可以没有归属的概念,即控制器A不是输入输出管理器A的归属控制器。每个输入输出管理器都保存LU与LU归属的控制器的对应关系。输入输出管理器根据数据操作请求中携带的目标LU的标识,查询目标LU标识与归属控制器的对应关系,确定目标LU的归属控制器,直接通过PCIe交换设备A或PCIe交换设备B,向目标LU的归属控制器发送请求。另外,控制器之间,或者控制器与SSD之间,或者输入输出管理器与控制器之间,输入输出管理器与SSD之间,缓存设备与控制器之间,或者缓存设备与SSD之间,可以通过任一PCIe交换设备通信。本发明实施例中的,目标硬盘存储地址中的目标硬盘中存储数据块X的逻辑存储地址,是指目标硬盘中存储数据块X的逻辑块地址。具体到本发明实施例,则是指目标SSD中存储数据块X的逻辑块地址。
本发明实施例图2中,只给了两个控制器、两个交换设备和两个输入输出管理器和一个缓存设备,但具体实现中,控制器、 交换设备、输入输出管理器和缓存设备的数量可以根据需要设定,灵活扩展。任一输入输出管理器通过任一交换设备与任一控制器连接,或者任一输入输出管理器通过任一交换设备与任一硬盘连接,或者任一输入输出管理器通过任一交换设备与任一缓存设备连接。任一控制器通过任一交换设备与任一控制器连接,或者任一控制器通过任一交换设备与任一硬盘连接,或者任一控制器通过任一交换设备与与任一缓存设备连接。任一缓存设备通过交换设备与任一硬盘连接。任一两种设备通过任一交换设备实现的连接可实现双向通信。任意两个交换设备直接连接。在本发明实施例提供的存储阵列架构中,逻辑上,将控制器统称为控制器平面,交换设备统称为交换平面,硬盘统称为存储平面,输入输出管理器统称为输入输出管理平面,缓存设备统称为缓存平面。在本发明实施例提供的架构中,数据读写控制与数据读写分离。由控制器实现数据读写控制,而数据读写(或者说读写的数据)不流经控制器,节省了控制器CPU的计算资源和控制器的内存资源,提高数据写入效率,提高了存储阵列的数据处理效率。本发明实施例存储阵列架构可以实现控制器、硬盘等设备的扩展,可根据存储阵列的性能需求,灵活增加控制器、交换设备、硬盘等。
当然,本发明实施例中的技术方案也可以应用到存储阵列包含一个输入输出管理器、一个控制器、一个交换设备、一个缓存设备和若干硬盘的场景。在这种场景下,数据写入存储阵列的方式可以参考前述实施例描述。存储阵列中进行重复数据删除的场景参考前面实施例描述。在存储阵列中进行数据读取操作可参考前面的实施例的描述。当然存储阵列也可以包括两个控制器、一个交换设备,两个控制器分别与交换设备连接的场景,在这样场景下进行数据写入、重复数据删除和数据读取操作,也参考前面 实施例描述,在此不再赘述。本发明实施例中,设备A,根据设备B的标识和缓存地址A,通过PCIe交换设备A或者PCIe交换设备B,从缓存地址A中读取数据(或称为直接从缓存地址A中读取数据),或者向缓存地址A写入数据(或称为直接向缓存地址A写入数据),这种实现方式,具体可以通过直接内存访问技术(Direct Memory Access,DMA)技术实现。其中,设备A和设备B代表本发明实施例中具体进行DMA访问的设备。
控制器,获取设备B的缓存地址,通过PCIe交换设备A或者PCIe交换设备B,向设备C发送设备B的标识和设备B的缓存地址,因为控制器,通过PCIe交换设备A或者PCIe交换设备B,与设备B通信获取缓存地址,已经知道设备B的标识,所以获取缓存地址,可以向设备C发送设备B的标识和设备B的缓存地址。当然,控制器也可以获取设备B的标识和缓存地址。设备B的标识可以为设备B的地址,或者其他唯一识别该设备的标识。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所公开的系统、方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功 能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取非易失性存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个非易失性存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的非易失性存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (10)

  1. 一种重复数据删除方法,所述方法应用于存储阵列,其特征在于,所述存储阵列包括交换设备、第一控制器、缓存设备;其中,所述第一控制器与所述交换设备连接;所述缓存设备与所述交换设备连接;所述交换设备与所述存储阵列中的硬盘连接;所述方法包括:
    所述第一控制器从所述缓存设备接收待重删数据块的特征值,在数据块特征值索引集合查找所述待重删数据块的特征值;
    当在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述第一控制器,通过所述交换设备,获取所述待重删数据块在所述缓存设备中的缓存地址;
    所述第一控制器,通过所述交换设备,向目标硬盘的控制器发送数据读取指令;所述数据读取指令携带所述缓存设备的标识和所述缓存地址;
    所述目标硬盘的控制器根据所述缓存设备的标识和所述缓存地址,通过所述交换设备,从所述缓存地址中读取所述待重删数据块;
    所述目标硬盘的控制器将所述待重删数据块存储到所述目标硬盘。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述目标硬盘的控制器,通过所述交换设备,向所述第一控制器发送目标硬盘存储地址;所述目标硬盘存储地址包括所述目标硬盘的控制器标识和所述目标硬盘中存储所述待重删数据块的逻辑存储地址;
    所述第一控制器在所述数据块特征值索引集合中建立所述待重 删数据块的特征值索引;所述待重删数据块的特征值索引包括所述待重删数据块的特征值和所述目标硬盘存储地址。
  3. 根据权利要求1所述的方法,其特征在于,所述存储阵列还包括第二控制器,所述第二控制器与所述交换设备连接;所述第二控制器存储所述待重删数据块地址,所述第二控制器为所述待重删数据块所在的目标逻辑单元的归属控制器;则所述第一控制器从所述缓存设备接收待重删数据块的特征值,具体包括:
    所述缓存设备通过所述交换设备,向所述第二控制器发送所述待重删数据块的特征值;
    所述第二控制器确定所述待重删数据块的特征值的归属控制器为所述第一控制器;
    所述第二控制器,通过所述交换设备,向所述第一控制器发送所述待重删数据块的特征值。
  4. 根据权利要求3所述的方法,其特征在于,在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述方法还包括:所述第一控制器,通过所述交换设备,向所述第二控制器发送通知,所述通知中携带所述目标硬盘存储地址;
    所述第二控制器根据所述通知,建立所述待重删数据块地址、所述待重删数据块的特征值和所述目标硬盘存储地址的对应关系。
  5. 根据权利要求3所述的方法,其特征在于,所述方法还包括:所述第二控制器建立所述待重删数据块地址、所述待重删数据块的特征值与所述第一控制器地址的对应关系。
  6. 一种存储阵列,其特征在于,所述存储阵列包括交换设备、第一控制器、缓存设备;其中,所述第一控制器与所述交换设备连接;所述缓存设备与所述交换设备连接;所述交换设备与所述存储阵列中的硬盘连接;
    所述第一控制器用于从所述缓存设备接收待重删数据块的特征值,在数据块特征值索引集合查找所述待重删数据块的特征值;;
    当在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述第一控制器还用于通过所述交换设备,获取所述待重删数据块在所述缓存设备中的缓存地址;
    所述第一控制器还用于通过所述交换设备,向目标硬盘的控制器发送数据读取指令;所述数据读取指令携带所述缓存设备的标识和所述缓存地址;
    所述目标硬盘的控制器用于根据所述缓存设备的标识和所述缓存地址,通过所述交换设备,从所述缓存地址中读取所述待重删数据块;
    所述目标硬盘的控制器还用于将所述待重删数据块存储到所述目标硬盘。
  7. 根据权利要求6所述的存储阵列,其特征在于,
    所述目标硬盘的控制器还用于通过所述交换设备,向所述第一控制器发送目标硬盘存储地址;所述目标硬盘存储地址包括所述目标硬盘的控制器标识和所述目标硬盘中存储所述待重删数据块的逻辑存储地址;
    所述第一控制器还用于在所述数据块特征值索引集合中建立所述待重删数据块的特征值索引;所述待重删数据块的特征值索引包括所述待重删数据块的特征值和所述目标硬盘存储地址。
  8. 根据权利要求6所述的存储阵列,其特征在于,所述存储阵列还包括第二控制器,所述第二控制器与所述交换设备连接;所述第二控制器用于存储所述待重删数据块地址,所述第二控制器为所述待重删数据块所在的目标逻辑单元的归属控制器;则所述第一控制器从所述缓存设备接收待重删数据块的特征值,具体包括:
    所述缓存设备通过所述交换设备,向所述第二控制器发送所述待重删数据块的特征值;
    所述第二控制器确定所述待重删数据块的特征值的归属控制器为所述第一控制器;
    所述第二控制器,通过所述交换设备,向所述第一控制器发送所述待重删数据块的特征值。
  9. 根据权利要求8所述的存储阵列,其特征在于,在所述数据块特征值索引集合中没有查找到所述待重删数据块的特征值时,所述第一控制器还用于通过所述交换设备,向所述第二控制器发送通知,所述通知中携带所述目标硬盘存储地址;
    所述第二控制器还用于根据所述通知,建立所述待重删数据块地址、所述待重删数据块的特征值和所述目标硬盘存储地址的对应关系。
  10. 根据权利要求8所述的存储阵列,其特征在于,所述第二控制器还用于建立所述待重删数据块地址、所述待重删数据块的特征值与所述第一控制器地址的对应关系。
PCT/CN2014/086530 2014-09-15 2014-09-15 重复数据删除方法和存储阵列 WO2016041127A1 (zh)

Priority Applications (9)

Application Number Priority Date Filing Date Title
CN201480001884.0A CN105612489B (zh) 2014-09-15 2014-09-15 重复数据删除方法和存储阵列
JP2016547563A JP6254293B2 (ja) 2014-09-15 2014-09-15 データ重複排除方法及びストレージアレイ
BR112016003763-4A BR112016003763B1 (pt) 2014-09-15 2014-09-15 Método de desduplicação de dados e arranjo de armazenamento.
AU2014403332A AU2014403332B2 (en) 2014-09-15 2014-09-15 Data deduplication method and storage array
KR1020167005272A KR101716264B1 (ko) 2014-09-15 2014-09-15 데이터 중복제거 방법 및 스토리지 어레이
PCT/CN2014/086530 WO2016041127A1 (zh) 2014-09-15 2014-09-15 重复数据删除方法和存储阵列
CA2920004A CA2920004C (en) 2014-09-15 2014-09-15 Data deduplication method and storage array
EP14898354.7A EP3037949B1 (en) 2014-09-15 2014-09-15 Data duplication method and storage array
US15/449,083 US20170177489A1 (en) 2014-09-15 2017-03-03 Data deduplication system and method in a storage array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/086530 WO2016041127A1 (zh) 2014-09-15 2014-09-15 重复数据删除方法和存储阵列

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/449,083 Continuation US20170177489A1 (en) 2014-09-15 2017-03-03 Data deduplication system and method in a storage array

Publications (1)

Publication Number Publication Date
WO2016041127A1 true WO2016041127A1 (zh) 2016-03-24

Family

ID=55532419

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/086530 WO2016041127A1 (zh) 2014-09-15 2014-09-15 重复数据删除方法和存储阵列

Country Status (9)

Country Link
US (1) US20170177489A1 (zh)
EP (1) EP3037949B1 (zh)
JP (1) JP6254293B2 (zh)
KR (1) KR101716264B1 (zh)
CN (1) CN105612489B (zh)
AU (1) AU2014403332B2 (zh)
BR (1) BR112016003763B1 (zh)
CA (1) CA2920004C (zh)
WO (1) WO2016041127A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018532166A (ja) * 2016-09-28 2018-11-01 華為技術有限公司Huawei Technologies Co.,Ltd. 記憶システムにおける重複排除のための方法、記憶システムおよびコントローラ
CN114003181A (zh) * 2022-01-04 2022-02-01 苏州浪潮智能科技有限公司 数据写镜像系统、方法、装置、电子设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10705969B2 (en) * 2018-01-19 2020-07-07 Samsung Electronics Co., Ltd. Dedupe DRAM cache
CN108897806A (zh) * 2018-06-15 2018-11-27 东软集团股份有限公司 数据一致性比对方法、装置、存储介质及电子设备
CN112714910B (zh) * 2018-12-22 2022-12-27 华为云计算技术有限公司 分布式存储系统及计算机程序产品
WO2021016728A1 (zh) * 2019-07-26 2021-02-04 华为技术有限公司 存储系统中数据处理方法、装置及计算机存储可读存储介质
CN113411398B (zh) * 2021-06-18 2022-02-18 全方位智能科技(南京)有限公司 一种基于大数据的文件清理写入及清理管理系统及方法
CN113253947B (zh) * 2021-07-16 2021-10-15 苏州浪潮智能科技有限公司 一种重删方法、装置、设备及可读存储介质
CN113627132B (zh) * 2021-08-27 2024-04-02 智慧星光(安徽)科技有限公司 数据去重标记码生成方法、系统、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101809559A (zh) * 2007-09-05 2010-08-18 伊姆西公司 在虚拟化服务器和虚拟化存储环境中的去重复
CN102063274A (zh) * 2010-12-30 2011-05-18 成都市华为赛门铁克科技有限公司 存储阵列和存储系统及数据访问方法
CN102156703A (zh) * 2011-01-24 2011-08-17 南开大学 一种低功耗的高性能重复数据删除系统
CN102982122A (zh) * 2012-11-13 2013-03-20 浪潮电子信息产业股份有限公司 一种适用于海量存储系统的重复数据删除方法

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW579463B (en) * 2001-06-30 2004-03-11 Ibm System and method for a caching mechanism for a central synchronization server
US7428613B1 (en) * 2004-06-29 2008-09-23 Crossroads Systems, Inc. System and method for centralized partitioned library mapping
JP4901316B2 (ja) * 2006-06-06 2012-03-21 株式会社日立製作所 ストレージシステム及び記憶制御装置
JP4394670B2 (ja) * 2006-09-28 2010-01-06 株式会社日立製作所 ディスク制御装置及びストレージシステム
US8209506B2 (en) 2007-09-05 2012-06-26 Emc Corporation De-duplication in a virtualized storage environment
JP4480756B2 (ja) * 2007-12-05 2010-06-16 富士通株式会社 ストレージ管理装置、ストレージシステム制御装置、ストレージ管理プログラム、データ記憶システムおよびデータ記憶方法
JP2009251725A (ja) * 2008-04-02 2009-10-29 Hitachi Ltd 記憶制御装置及び記憶制御装置を用いた重複データ検出方法。
US8185691B2 (en) * 2008-06-30 2012-05-22 Netapp, Inc. Optimized cache coherency in a dual-controller storage array
WO2010045262A1 (en) * 2008-10-14 2010-04-22 Wanova Technologies, Ltd. Storage-network de-duplication
US8595397B2 (en) * 2009-06-09 2013-11-26 Netapp, Inc Storage array assist architecture
JP4838878B2 (ja) * 2009-12-04 2011-12-14 富士通株式会社 データ管理プログラム、データ管理装置、およびデータ管理方法
JP4892072B2 (ja) * 2010-03-24 2012-03-07 株式会社東芝 ホスト装置と連携して重複データを排除するストレージ装置、同ストレージ装置を備えたストレージシステム、及び同システムにおける重複排除方法
US9110936B2 (en) * 2010-12-28 2015-08-18 Microsoft Technology Licensing, Llc Using index partitioning and reconciliation for data deduplication
CN102833298A (zh) * 2011-06-17 2012-12-19 英业达集团(天津)电子技术有限公司 分布式的重复数据删除系统及其处理方法
CN102866935B (zh) * 2011-07-07 2014-11-12 北京飞杰信息技术有限公司 基于iscsi的即时复制方法和存储系统
US8620886B1 (en) * 2011-09-20 2013-12-31 Netapp Inc. Host side deduplication
US9229853B2 (en) * 2011-12-20 2016-01-05 Intel Corporation Method and system for data de-duplication
KR101341995B1 (ko) * 2011-12-26 2013-12-16 성균관대학교산학협력단 공유 데이터 저장소 관리 장치 및 방법
US9417811B2 (en) * 2012-03-07 2016-08-16 International Business Machines Corporation Efficient inline data de-duplication on a storage system
US8856443B2 (en) * 2012-03-12 2014-10-07 Infinidat Ltd. Avoiding duplication of data units in a cache memory of a storage system
US8706971B1 (en) * 2012-03-14 2014-04-22 Netapp, Inc. Caching and deduplication of data blocks in cache memory
WO2014136183A1 (ja) * 2013-03-04 2014-09-12 株式会社日立製作所 ストレージ装置及びデータ管理方法
US9602428B2 (en) * 2014-01-29 2017-03-21 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for locality sensitive hash-based load balancing
US9760578B2 (en) * 2014-07-23 2017-09-12 International Business Machines Corporation Lookup-based data block alignment for data deduplication
WO2016013086A1 (ja) * 2014-07-24 2016-01-28 株式会社日立製作所 計算機システムおよびメモリ割当管理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101809559A (zh) * 2007-09-05 2010-08-18 伊姆西公司 在虚拟化服务器和虚拟化存储环境中的去重复
CN102063274A (zh) * 2010-12-30 2011-05-18 成都市华为赛门铁克科技有限公司 存储阵列和存储系统及数据访问方法
CN102156703A (zh) * 2011-01-24 2011-08-17 南开大学 一种低功耗的高性能重复数据删除系统
CN102982122A (zh) * 2012-11-13 2013-03-20 浪潮电子信息产业股份有限公司 一种适用于海量存储系统的重复数据删除方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3037949A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018532166A (ja) * 2016-09-28 2018-11-01 華為技術有限公司Huawei Technologies Co.,Ltd. 記憶システムにおける重複排除のための方法、記憶システムおよびコントローラ
CN114003181A (zh) * 2022-01-04 2022-02-01 苏州浪潮智能科技有限公司 数据写镜像系统、方法、装置、电子设备及存储介质
CN114003181B (zh) * 2022-01-04 2022-05-20 苏州浪潮智能科技有限公司 数据写镜像系统、方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
KR20160047482A (ko) 2016-05-02
CA2920004C (en) 2019-10-22
EP3037949B1 (en) 2019-07-31
AU2014403332B2 (en) 2017-04-20
KR101716264B1 (ko) 2017-03-14
CN105612489B (zh) 2017-08-29
CN105612489A (zh) 2016-05-25
CA2920004A1 (en) 2016-03-15
EP3037949A1 (en) 2016-06-29
US20170177489A1 (en) 2017-06-22
BR112016003763B1 (pt) 2019-04-02
EP3037949A4 (en) 2016-10-12
JP6254293B2 (ja) 2017-12-27
JP2017505487A (ja) 2017-02-16

Similar Documents

Publication Publication Date Title
US10042560B2 (en) Method and storage array for processing a write data request
WO2016041127A1 (zh) 重复数据删除方法和存储阵列
US10891054B2 (en) Primary data storage system with quality of service
US10853274B2 (en) Primary data storage system with data tiering
US10169365B2 (en) Multiple deduplication domains in network storage system
US20200019516A1 (en) Primary Data Storage System with Staged Deduplication
TWI771933B (zh) 借助命令相關過濾器來進行重複資料刪除管理的方法、主裝置以及儲存伺服器
JP6924671B2 (ja) データ書込み要求処理方法及びストレージアレイ
US11226769B2 (en) Large-scale storage system and data placement method in large-scale storage system
JP6552583B2 (ja) データ重複排除方法及びストレージアレイ

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2014898354

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2920004

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2014403332

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 20167005272

Country of ref document: KR

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112016003763

Country of ref document: BR

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14898354

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016547563

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112016003763

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20160222