WO2016079809A1 - Unité de stockage, serveur de fichier et procédé de stockage de données - Google Patents

Unité de stockage, serveur de fichier et procédé de stockage de données Download PDF

Info

Publication number
WO2016079809A1
WO2016079809A1 PCT/JP2014/080513 JP2014080513W WO2016079809A1 WO 2016079809 A1 WO2016079809 A1 WO 2016079809A1 JP 2014080513 W JP2014080513 W JP 2014080513W WO 2016079809 A1 WO2016079809 A1 WO 2016079809A1
Authority
WO
WIPO (PCT)
Prior art keywords
chunk
container
work
processor
storage device
Prior art date
Application number
PCT/JP2014/080513
Other languages
English (en)
Japanese (ja)
Inventor
ファビオ 平良
Original Assignee
株式会社日立製作所
株式会社日立情報通信エンジニアリング
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所, 株式会社日立情報通信エンジニアリング filed Critical 株式会社日立製作所
Priority to PCT/JP2014/080513 priority Critical patent/WO2016079809A1/fr
Publication of WO2016079809A1 publication Critical patent/WO2016079809A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures

Definitions

  • the present invention relates to a storage apparatus.
  • a deduplication technique is known in which a plurality of data overlapping each other is determined in a storage device, and only one of the plurality of data is stored.
  • the storage device receives the write data, it reads the data related to the partial data of the write data out of the data stored in the storage device into the memory, and determines whether or not the partial data is duplicated with the read data.
  • the partial data determined to be not duplicated is written to the storage device.
  • the storage apparatus reads another data from the storage device and replaces the data in the memory with another data.
  • the backup device uses the first duplication determination information to determine that the content is not stored in the storage device, and the storage device uses the second duplication determination information to store the content in the storage device.
  • the storage device transmits the second duplication judgment information to the backup device, and the backup device incorporates the received second duplication judgment information into the first duplication judgment information. Has been.
  • a storage apparatus includes a storage device, a memory, and a processor connected to the storage device, the memory, and a host computer.
  • the processor receives write data from the host computer, the processor creates a work container that is a container for including data in the memory, and the processor divides the write data into a plurality of chunks.
  • the processor selects one of the plurality of chunks as a work chunk in order of a chunk position indicating a position of each of the plurality of chunks in the write data, and the processor selects each chunk in the storage device.
  • the processor determines whether or not a matching chunk that may overlap with the work chunk is stored in the storage device, and the matching chunk is stored in the storage device. If it is determined that the processor contains the matching chunk, A conforming container is read into the memory, and the processor determines, based on the conforming container, whether the work chunk overlaps with the conforming chunk, and if the work chunk does not overlap with the conforming chunk. If determined, the processor includes the work chunk in the work container, the processor determines whether the work container satisfies a predetermined close condition, and the work container satisfies the close condition.
  • the processor determines whether or not the work container satisfies a predetermined registration condition, and if determined that the work container satisfies the registration condition, the processor Chunk information based on each chunk in the work container is included in the index information, and the work container If it is determined that the closed condition is satisfied, the processor writes the working container into the storage device.
  • Processing to replace data in memory with data in storage device can be reduced, and the performance of the storage device can be improved.
  • the duplication exclusion process of the comparative example in a 1st state is shown.
  • the deduplication process of the comparative example in a 2nd state is shown.
  • Stage processing is shown.
  • Destage processing is shown.
  • the duplication exclusion process of the comparative example in a 3rd state is shown.
  • the deduplication process of the comparative example in a 4th state is shown.
  • the deduplication process of the comparative example in a 5th state is shown.
  • the deduplication process of the comparative example in a 6th state is shown.
  • the deduplication process of embodiment in a 1st state is shown.
  • the deduplication process of embodiment in a 2nd state is shown.
  • the deduplication process of embodiment in a 3rd state is shown.
  • the deduplication process of embodiment in a 7th state is shown.
  • the deduplication process of embodiment in an 8th state is shown.
  • the structure of a file server is shown. 2 shows a functional configuration of the storage apparatus 10.
  • the structure of the process data 420 is shown. Indicates backup processing. Indicates chunk write processing.
  • the conforming container stage process is shown. Indicates new chunk storage processing.
  • the container close determination process is shown. The container close process is shown. Read processing is shown.
  • the information of the present embodiment will be described using the expressions “file” and “index”, but these information may not necessarily be expressed by these data structures.
  • it may be expressed by a data structure such as “table”, “list”, “DB (database)”, “queue”, or the like. Therefore, “file”, “index”, “table”, “list”, “DB”, “queue”, and the like can be simply referred to as “information” in order to show that they do not depend on the data structure.
  • the expressions “identification information”, “identifier”, “name”, “name”, “ID”, and “number” can be used. Can be replaced.
  • program will be the subject of the description, but since the program uses a memory and a communication port (communication control device) to perform processing defined by being executed by a CPU (Central Processing Unit), The description may be based on the CPU.
  • the processing disclosed with the program as the subject may be processing performed by a computer such as a server computer, a storage controller, a management computer, or an information processing apparatus. Part or all of the program may be realized by dedicated hardware, or may be modularized.
  • Various programs may be installed in each computer by a program distribution server or a storage medium.
  • the management computer has input / output devices.
  • input / output devices include a display, a keyboard, and a pointer device, but other devices may be used.
  • a serial interface or an Ethernet interface is used as the input / output device, and a display computer having a display or keyboard or pointer device is connected to the interface, and the display information is transmitted to the display computer.
  • the display computer may perform the display, or the input may be replaced by the input / output device by receiving the input.
  • a set of one or more computers that manage the computer system and display the display information of the present invention may be referred to as a management system.
  • the management computer displays display information
  • the management computer is a management system
  • a combination of the management computer and the display computer is also a management system.
  • a plurality of computers may realize processing equivalent to that of the management computer.
  • the plurality of computers if the display computer performs the display, display (Including computers) is the management system.
  • FIG. 1 shows the deduplication processing of the comparative example in the first state.
  • This figure shows a first state in which the storage device 90b of the comparative example receives new backup data BD1 from the host computer 80.
  • the storage apparatus 90b divides the backup data BD1 into chunks a, b,... Z as deduplication processing (S11). Each chunk has a preset chunk length.
  • the storage apparatus 90b sequentially determines whether or not the chunks a, b,... Z overlap with existing chunks stored in the storage apparatus 90b, and the chunks a, b,. It is determined that all do not overlap with existing chunks (new chunks).
  • the storage apparatus 90b creates a new container CN1 for storing a new chunk on the memory as a work container, and sequentially stores the new chunks in the work container (S12).
  • the number of chunks stored in the work container is referred to as the number of stored chunks.
  • the storage apparatus 90b creates a new container CN2 as a work container and stores subsequent new chunks in the work container.
  • FIG. 2 shows the deduplication processing of the comparative example in the second state.
  • This figure shows a second state in which the storage apparatus 90b receives the next backup data BD2 from the host computer 80 after the first state.
  • the storage apparatus 90b divides the backup data BD2 into a plurality of chunks as deduplication processing (S21), and determines that the chunks f1, o1, and p1 are new chunks.
  • the storage apparatus 90b creates a new container CN3 as a work container and stores the new chunk in the work container (S22).
  • FIG. 3 shows stage processing
  • the storage device 90b includes a memory 91b and a disk 92b.
  • the disk 92b stores a container.
  • the memory 91b stores backup data received from the host computer 80 and a container read from the disk 92b for duplication determination.
  • the storage device 90b selects chunks in the backup data in order from the top of the backup data, and determines whether the selected chunk is duplicated with the data stored in the storage device 90 Make a decision.
  • the storage apparatus 90b reads (stages) the container related to the chunk i in the backup data from the disk 92b to the memory 91b (S31), and compares the chunk i with the chunk on the memory 91b.
  • FIG. 4 shows the destage processing
  • a stage container number upper limit value which is an upper limit value of the number of stage containers, is set in advance.
  • the storage apparatus 90b invalidates (destages) the container with the oldest staged time (S41). Thereafter, the storage apparatus 90b stages the container including the next chunk for duplication determination (S42).
  • FIG. 5 shows the deduplication processing of the comparative example in the third state.
  • This figure shows a third state in which the storage apparatus 90b receives the next backup data BD5 from the host computer 80 after receiving the backup data from the host computer 80.
  • the storage apparatus 90b divides the backup data BD5 into a plurality of chunks, and determines that chunks a1, k1, and o1 are new chunks (S51).
  • the storage apparatus 90b creates a new container CN5, stores the new chunks a1, k1, and o1 in the container (S52), and destages the container (S53).
  • FIG. 6 shows the deduplication process of the comparative example in the fourth state.
  • This figure shows a fourth state in which the storage apparatus 90b receives the next backup data BD6 from the host computer 80 after the third state.
  • the storage apparatus 90b divides the backup data BD6 into a plurality of chunks, and selects the container CN5 including the chunks a1, k1, and o1 from the disk 92b in order to determine the received chunk a1 as a duplicate (S61). Then, the selected container CN5 is staged (S62), and the received chunk a1 is compared with the staged chunk a1.
  • FIG. 7 shows the deduplication processing of the comparative example in the fifth state.
  • This figure shows a fifth state after the fourth state, in which the storage apparatus 90b performs duplication determination from chunk b to chunk i in the backup data BD6, and determines chunk j as duplication.
  • the storage device 90b determines that the containers CN5, CN6, and CN7 are staged by the duplication determination until immediately before (S71), and the number of stage containers has reached the upper limit of the number of stage containers.
  • FIG. 8 shows the deduplication processing of the comparative example in the sixth state.
  • This figure shows the sixth state after the fifth state in which the storage apparatus 90b determines that the chunk k1 after the chunk j is duplicated in the backup data BD6.
  • the storage device 90b destages the oldest staged container CN6 (S82) because the number of stage containers has reached the upper limit of the number of stage containers due to the duplication determination up to the chunk j (S81), and within the disk 92b.
  • the container CN5 including the chunk a1 is staged again (S83). That is, the storage apparatus 90b stages immediately after destaging the container CN5. As the number of destages and stages increases, the performance of the storage device 90b decreases.
  • FIG. 9 shows the deduplication processing of the embodiment in the first state.
  • the storage apparatus 90 of the present embodiment divides the backup data BD1 into chunks a, b,... Z, and assigns chunk numbers to each chunk in order from the top of the backup data BD1 (S1111). Instead of the chunk number, a chunk position indicating the position of the chunk in the backup data BD1, such as an offset from the head of the backup data BD1, may be used. Thereafter, the storage apparatus 90 performs duplication determination of each chunk in order of the chunk number, opens the new container CN1 as a work container on the memory, and stores the new chunks in the work container in order (S1112). To open a work container is to create a work container on the memory and make it possible to add a new chunk to the work container.
  • the storage device 90 When the new chunk is stored in the work container, the storage device 90 records the first chunk number of the work container as head (starting chunk number) and the last chunk number of the working container as tail (endpoint chunk number). When the number of stored chunks reaches the upper limit value of the number of stored chunks, the storage device 90 closes the work container and creates a new container CN2 as a work container. Closing the work container means that a new chunk cannot be added to the work container, the work container is written to the disk, and the work container in the memory is invalidated (destaged).
  • FIG. 10 shows the deduplication processing of the embodiment in the second state.
  • the storage device 90 divides the backup data BD2 into a plurality of chunks (S1121), opens the new container CN3 as a work container, detects new chunks f1, o1, and p1 from the backup data BD2, and creates a new one.
  • the chunks f1, o1, and p1 are stored in the work container (S1122), and the work container is closed.
  • FIG. 11 shows the deduplication processing of the embodiment in the seventh state.
  • This figure shows a seventh state in which the storage apparatus 90 receives the next backup data BD5 from the host computer 80 after receiving the backup data.
  • the storage device 90 includes a memory 91 and a disk 92.
  • the storage apparatus 90 creates a work container on the memory 91 by opening the new container CN5 as a work container, detects the new chunks a1, k1, and o1 in the backup data BD5, and creates the new chunks a1, k1.
  • O1 is stored in the work container (S1131). Furthermore, the storage apparatus 90 determines whether or not the work container satisfies a predetermined registration condition.
  • the registration condition is that the interval evaluation value indicating the spread of chunk numbers in the work container is smaller than a predetermined interval evaluation threshold, and the number of stored chunks is larger than a predetermined lower limit value of the number of stored chunks.
  • the interval evaluation value is represented by, for example, “(tail-head) / number of stored chunks”.
  • the interval evaluation value increases as the interval between chunk numbers of a plurality of chunks in the work container increases.
  • the storage apparatus 90 when it is determined that the work container does not satisfy the registration condition, the storage apparatus 90 writes the work container to the disk 92 as a non-registered container (S1132).
  • the storage apparatus 90 can prevent reference to the chunk in this container in subsequent duplication determination by not including the chunk information indicating the chunk in the unregistered container in the index information. Further, the storage apparatus 90 closes the work container (S1133).
  • FIG. 12 shows the deduplication processing of the embodiment in the eighth state.
  • This figure shows an eighth state in which the storage apparatus 90 receives the next backup data BD6 from the host computer 80 after the seventh state.
  • the storage device 90 determines that the chunk a1 in the backup data BD6 is duplicated (S1141), the container CN5 including the chunks a1, k1, and o1 in the disk 92 is an unregistered container.
  • the duplication determination to be used is not performed (S1142), the chunk a1 is determined as a new chunk (S1143), the new container CN12 is opened as a work container, the chunk a1 is stored in the work container, and the work container is closed. Therefore, the storage apparatus 90 does not stage the unregistered container CN5. Thereby, the storage apparatus 90 can reduce the number of stages and destages compared to the storage apparatus 90b.
  • This deduplication process stores the first new chunk in the work container, stores the second new chunk having a chunk number close to the chunk number of the first new chunk, and starts from the chunk number of the second new chunk.
  • a third new chunk having a far chunk number may be stored in the work container.
  • the deduplication processing determines that the work container is a non-registered container, and duplication determination is not performed for all chunks in the non-registered container, so that the deduplication rate decreases. Therefore, the deduplication processing of this embodiment performs the following processing.
  • FIG. 13 shows the deduplication processing of the embodiment in the ninth state.
  • This figure shows a ninth state in which the storage apparatus 90 receives the next backup data BD7 from the host computer 80 after the eighth state.
  • the storage apparatus 90 divides the backup data BD7 into a plurality of chunks as deduplication processing, and determines that chunks a1, b1, c1, and o1 are new chunks.
  • the storage apparatus 90 creates the new container CN13 as a work container, and sequentially selects new chunks as work chunks.
  • the storage apparatus 90 determines whether or not the work container satisfies a predetermined separation condition.
  • the separation condition is a distance evaluation threshold in which a distance evaluation value indicating a distance from the chunk number of the immediately previous work chunk to the chunk number of the current work chunk is predetermined. And the number of stored chunks is larger than the separation chunk number threshold.
  • the distance evaluation value is represented by “work chunk number ⁇ tail”, for example.
  • the storage apparatus 90 can store the chunks whose positions are close to each other in the backup data in one container, and can store the chunks whose positions are separated from each other in the backup data in different containers. Thereby, the storage apparatus 90 can use a container including a plurality of chunks close to each other, such as a plurality of consecutive chunks, as a registered container, and can suppress a decrease in the deduplication rate.
  • FIG. 14 shows the configuration of the file server of this embodiment.
  • the file server of this embodiment includes the storage device 10 and the host computer 30.
  • the storage device 10 corresponds to the storage device 90 described above.
  • the storage apparatus 10 includes a plurality of nodes 40 (N0, N1,... Nn) and a disk array 60.
  • Each of the plurality of nodes 40 is connected to the host computer 30 via a communication network such as a LAN (Local Area Network).
  • the disk array 60 corresponds to the disk 92 described above.
  • the disk array 60 is connected to a plurality of nodes 40 via a communication network such as an FC (Fibre Channel) cable.
  • the number of nodes 40 may be one.
  • other storage devices including storage devices such as HDD (Hard Disk Drive) and SSD (Solid State Drive) may be used.
  • the node 40 includes a CPU (Central Processing Unit) 110, a memory 120, an FC port 130, an HDD 140, and a network interface (NW I / F) 150.
  • the HDD 140 stores programs and data for the node 40.
  • the CPU 110 executes processing according to a program in the HDD 140.
  • the memory 120 corresponds to the memory 91 described above.
  • the memory 120 stores programs read from the HDD 140, data used for processing, data transmitted to and received from the host computer 30, data communicated to the disk array 60, and the like.
  • the FC port 130 is connected to the disk array 60 and communicates with the disk array 60 in accordance with an instruction from the CPU 110.
  • the NW I / F 150 is connected to the host computer 30 and communicates with the host computer 30 in accordance with an instruction from the CPU 110.
  • the disk array 60 includes two duplicated controllers 210 (CTL0, CTL1) and a plurality of HDDs 230.
  • the controller 210 is connected to the node 40 and performs I / O processing for a plurality of HDDs 230 in accordance with instructions from the node 40.
  • the plurality of HDDs 230 store data from the node 40.
  • the number of controllers 210 may be one.
  • the controller 210 creates an LU using a plurality of HDDs 230 and provides it to the node 40.
  • the controller 210 includes a CPU 211, a memory 212, an FC port 213, and a disk interface (I / F) 214.
  • the memory 212 stores programs and data.
  • the CPU 211 executes processing such as RAID (Redundant Arrays of Inexpensive Disks) according to a program in the memory 212.
  • the FC port 213 is connected to the node 40 via the FC, and communicates with the node 40 in accordance with an instruction from the CPU 211.
  • the disk I / F 214 is connected to a plurality of HDDs 230 and accesses the HDDs 230 according to instructions from the CPU 211.
  • Other SAN Storage Area Network
  • the host computer 30 is connected to a terminal device via a communication network.
  • the host computer 30 stores data accessed from the terminal device.
  • the host computer 30 creates backup data based on the stored data, and transmits a request to write the backup data to the storage apparatus 10.
  • the host computer 30 transmits a request for reading backup data to the storage apparatus 10 in order to restore the data.
  • the file server may include a backup server connected to the host computer 30 and the storage apparatus 10 via a communication network.
  • the host computer 30 stores the file, and transmits a write request for the file to be backed up from the stored file to the backup server.
  • the backup server receives a file from the host computer 30, creates backup data based on the received file, and transmits it to the storage device 10. Further, the host computer 30 transmits a file read request to the backup server.
  • the backup server reads backup data including the specified file from the storage apparatus 10 and transmits the specified file to the host computer 30.
  • FIG. 15 shows a functional configuration of the storage apparatus 10.
  • the node 40 includes a deduplication processing unit 310 and a file system management unit 320 as functions by programs in the memory 120.
  • the deduplication processing unit 310 receives the backup data 410 from the host computer 30 and performs deduplication processing on the backup data 410.
  • the file system management unit 320 creates a file system (FS) 330 using the LU provided from the disk array 60.
  • the file system management unit 320 writes processing data 420 that is backup data after deduplication processing to the file system 330 based on an instruction from the deduplication processing unit 310.
  • the file system management unit 320 reads the processing data 420 from the file system 330 based on an instruction from the deduplication processing unit 310.
  • the deduplication processing unit 310 reconstructs the backup data 410 from the processing data 420 and transmits it to the host computer 30.
  • the memory 120 further stores work data 440 for the deduplication processing unit 310.
  • the deduplication processing unit 310 uses the file system management unit 320 to acquire a setting value from the setting data 460 in the file system 330.
  • the setting data 460 may include any one of a storage chunk number upper limit value, a storage chunk number lower limit value, an interval evaluation threshold value, a distance evaluation threshold value, a separation chunk number threshold value, and a stage container number upper limit value.
  • the management computer connected to the node 40, the node 40, the host computer 30, etc. may write the setting data 460.
  • the setting data 460 may be written in the memory 120, the HDD 14, or the like.
  • the disk array 60 is file-accessed from the node 40, but may be block-accessed from the node 40.
  • the chunk may be a block.
  • the controller 210 may have the function of the node 40.
  • FIG. 16 shows the configuration of the processing data 420.
  • the processing data 420 includes a plurality of contents 510, a plurality of container indexes 520, a plurality of containers 530, and a plurality of chunk indexes 540.
  • the content 510 for reconstructing one backup data includes a content ID 610 indicating the content, and a plurality of chunk information 620 indicating a plurality of chunks included in the content.
  • Chunk information 620 indicating one chunk includes an offset 621 from the beginning of the backup data to the chunk, a length 622 of the chunk, a container ID 623 of the container 530 including the chunk, and a fingerprint 624 of the chunk.
  • the fingerprint 624 is a hash value obtained from the chunk.
  • the deduplication processing unit 310 can identify a chunk included in the backup data and a container including the chunk based on the content 510.
  • the fingerprint 624 is a value obtained by shortening the chunk data. As a result, it is possible to search for a chunk at a higher speed than searching for chunk data.
  • the container index 520 indicating one container includes a container ID 630 indicating the container and at least one chunk information 640 indicating the chunk included in the container.
  • the chunk information 640 indicating one chunk includes a fingerprint 641 of the chunk, an offset 642 from the top of the container to the chunk, and a length 643 of the chunk.
  • the deduplication processing unit 310 can identify the chunk included in the container based on the container index 520.
  • the container 530 includes a container ID 650 indicating the container and at least one chunk information 660 indicating the chunk included in the container.
  • the chunk information 660 indicating one chunk includes a length 661 of the chunk and chunk data 662 that is data of the chunk.
  • the beginning of the chunk data 662 in the container 530 is indicated by an offset 642 in the container index 520.
  • the length of the chunk data 662 in the container 530 is indicated by a length 643 in the container index 520 and a length 661 in the container 530.
  • the deduplication processing unit 310 can acquire the chunk data 662 from the container 530.
  • the chunk index 540 indicating a group of chunks that share a part of the fingerprint includes a group ID 670 indicating a part of the fingerprint and at least one chunk information 680 indicating a chunk belonging to the group.
  • the chunk information 680 indicating one chunk includes a fingerprint 681 of the chunk and a container ID 682 indicating a container to which the chunk belongs.
  • the chunk index 540 of the present embodiment includes the chunk information 680 of the chunk in the registered container, and does not include the chunk information 680 of the chunk in the unregistered container. As a result, the unregistered container is not staged because it is not searched for by subsequent duplicate determination.
  • the deduplication processing unit 310 instructs the file system management unit 320 to write out the processing data 420.
  • the file system management unit 320 writes each of the plurality of contents 510, the plurality of container indexes 520, the plurality of containers 530, and the plurality of chunk indexes 540 to the disk array 60 as files.
  • FIG. 17 shows the backup process
  • the backup data may be a file or may include a plurality of files.
  • the deduplication processing unit 310 performs backup initial processing for backup initialization (S110).
  • the deduplication processing unit 310 opens a work container that is a new container (S120).
  • the deduplication processing unit 310 creates an updatable work container in the work data 440.
  • the deduplication processing unit 310 opens a work container index which is a new container index corresponding to the work container (S130).
  • the deduplication processing unit 310 creates an updatable work container index in the work data 440.
  • the deduplication processing unit 310 opens work content that is content corresponding to the backup data from the processing data 420 (S140).
  • the deduplication processing unit 310 copies the work content from the processing data 420 on the disk array 60 to the work data 440 on the memory 120 so that the work content can be updated.
  • the above is the backup initial processing.
  • the deduplication processing unit 310 divides the backup data into a plurality of chunks and gives chunk numbers to the plurality of divided chunks (S150).
  • the deduplication processing unit 310 selects a work chunk number in order from a plurality of chunk numbers, performs a chunk writing process (described later) on a work chunk that is a chunk having the work chunk number (S160), and performs a plurality of divided pieces. It is determined whether chunk write processing has been performed on all chunks (S170). When it is determined that the chunk writing process has not been performed for all the chunks (S170: no), the deduplication processing unit 310 shifts the process to S160 and selects the next work chunk number.
  • the deduplication processing unit 310 performs a backup end process for ending the backup (S180), and ends this flow. .
  • the deduplication processing unit 310 closes the work content (S210).
  • the deduplication processing unit 310 replaces the work content in the processing data 420 with the work content in the work data 440.
  • the deduplication processing unit 310 closes the work container index (S220).
  • the deduplication processing unit 310 adds the work container index in the work data 440 to the process data 420.
  • the deduplication processing unit 310 validates the close flag (S230).
  • the deduplication processing unit 310 stores a close flag on the memory 120. The close flag indicates whether to close the work container.
  • the deduplication processing unit 310 performs container close processing (described later) (S240).
  • the deduplication processing unit 310 adds the work container in the work data 440 to the process data 420.
  • the above is the backup end process.
  • the file system management unit 320 reads a file from the disk array 60 to the memory 120 in accordance with an instruction to open the file in the processing data 420 by the deduplication processing unit 310. Further, the file system management unit 320 writes the file from the memory 120 to the disk array 60 in accordance with the instruction to close the file in the processing data 420 by the deduplication processing unit 310.
  • FIG. 18 shows chunk write processing
  • the deduplication processing unit 310 acquires the designated work chunk (S310). Thereafter, the deduplication processing unit 310 determines whether or not the work chunk overlaps with the staged chunk (S320). Here, when the chunk data of the work chunk is the same as any one of the chunk data in the container staged in the work data 440, the deduplication processing unit 310 determines that the chunk in which the work chunk is staged. It is determined that there are duplicates. Thereby, the deduplication processing unit 310 can determine at high speed that the work chunk overlaps with the chunk staged in the memory 120.
  • the deduplication processing unit 310 shifts the process to S410.
  • the deduplication processing unit 310 determines whether the work chunk is registered in the chunk index 540 (S330). .
  • the deduplication processing unit 310 calculates the fingerprint of the work chunk and, when detecting the fingerprint of the work chunk from the chunk index 540, determines that the work chunk is registered in the chunk index 540.
  • the deduplication processing unit 310 identifies the chunk indicated in the detected fingerprint 681 as a conforming chunk, and identifies the container indicated by the container ID 682 including the conforming chunk as a conforming container. As a result, the deduplication processing unit 310 can determine whether or not the matching chunk that may overlap with the work chunk is stored in the disk array 60. Note that the de-duplication processing unit 310 may speed up the determination in S330 using a Bloom filter.
  • the deduplication processing unit 310 shifts the process to S360.
  • the deduplication processing unit 310 When it is determined that the work chunk is registered in the chunk index 540 (S330: yes), the deduplication processing unit 310 performs a conforming container stage process (described later) for the conforming container (S340).
  • the deduplication processing unit 310 determines whether or not the work chunk is included in the conforming container (S350).
  • the deduplication processing unit 310 determines that the work chunk is included in the conforming container when the chunk data of the work chunk is the same as the chunk data of the conforming chunk in the conforming container staged by the stage processing. .
  • the deduplication processing unit 310 can determine that the work chunk overlaps with the chunk stored in the processing data 420 in the disk array 60.
  • the deduplication processing unit 310 When it is determined that the work chunk is not included in the conforming container (the work chunk is a new chunk) (S350: no), the deduplication processing unit 310 performs a new chunk storage process (described later) (S360), The process proceeds to S410.
  • the deduplication processing unit 310 performs the container close determination process (described later) (S410), and performs the container close process (described later). Perform (S420).
  • the deduplication processing unit 310 determines whether or not the work container is closed (S430). When it is determined that the work container is not closed (S430: no), the deduplication processing unit 310 shifts the process to S450. When it is determined that the work container is closed (S430: yes), the deduplication processing unit 310 opens a work container that is a new container (S440).
  • the deduplication processing unit 310 registers the work chunk information in the work content in the work data 440 (S450), and ends this flow.
  • FIG. 19 shows the conforming container stage process.
  • the deduplication processing unit 310 determines whether or not the conforming container is staged (S510). Thereby, even if there exists a period when the work data 440 is not locked between S320 and S340, the deduplication processing part 310 can confirm that the compatible container is not staged.
  • the deduplication processing unit 310 ends this flow.
  • the deduplication processing unit 310 determines whether or not the number of stage containers has reached the upper limit value of the number of stage containers (S520).
  • the deduplication processing unit 310 shifts the process to S540.
  • the deduplication processing unit 310 selects the oldest staged container among the staged containers. Destage the selected container (invalidate the container on the memory) (S530). Thereafter, the deduplication processing unit 310 stages the compatible container (S540), and ends this flow. At this time, the deduplication processing unit 310 instructs the file system management unit 320 on the stage of the compatible container.
  • the file system management unit 320 reads a container from the disk array 60 to the work data 440 in the memory 120 in accordance with the instruction.
  • the storage apparatus 10 replaces the oldest staged container in the memory 120 with a compatible container read from the disk array 60 when the number of stage containers reaches the upper limit number of stage containers. Can do.
  • the deduplication processing unit 310 measures the amount of containers staged in the memory 120, and determines whether or not the measured amount is equal to or greater than a predetermined stage amount upper limit value. Also good.
  • the amount of containers staged in the memory 120 may be the number of containers staged in the memory 120, may be the total size of the containers staged in the memory 120, or may be staged in the memory 120. It may be the total number of chunks in a given container.
  • FIG. 20 shows a new chunk storage process
  • the de-duplication processing unit 310 determines whether or not the number of stored chunks has reached the upper limit value of the number of stored chunks (S610). The fact that the number of stored chunks has reached the upper limit value of the number of stored chunks is sometimes called a stored chunk number condition.
  • the deduplication processing unit 310 shifts the process to S710.
  • the deduplication processing unit 310 enables the close flag (S620). As a result, the deduplication processing unit 310 can limit the number of stored chunks. Thereafter, the deduplication processing unit 310 performs container close processing (described later) (S630). Thereby, the work container is closed. Thereafter, the deduplication processing unit 310 opens the new container as a work container (S640).
  • the deduplication processing unit 310 stores the work chunk in the work container (S710). Thereafter, the deduplication processing unit 310 performs section setting processing (S720).
  • the deduplication processing unit 310 records the work chunk number in the tail (S730). Thereafter, the deduplication processing unit 310 determines whether a chunk number is recorded in the head (S740). When it is determined that the head stores the chunk number (S740: yes), the deduplication processing unit 310 ends the section setting process. On the other hand, when it is determined that the head does not store the chunk number (S740: no), the deduplication processing unit 310 records the work chunk number in the head (S750), and ends the section setting process. The above is the section setting process.
  • the deduplication processing unit 310 registers the chunk information 640 of the work chunk in the work container index (S760), and ends this flow.
  • the storage apparatus 10 can store the work chunk in the work container and record the head and tail.
  • FIG. 21 shows container close determination processing.
  • the de-duplication processing unit 310 determines whether or not the distance evaluation value is larger than the distance evaluation threshold (S810). When it is determined that the distance evaluation value is equal to or less than the distance evaluation threshold (S810: no), the deduplication processing unit 310 ends this flow. On the other hand, when it is determined that the distance evaluation value is greater than the distance evaluation threshold (S810: yes), the de-duplication processing unit 310 determines whether the number of stored chunks is equal to or greater than the separation chunk number threshold (S820). When it is determined that the number of stored chunks is smaller than the separation chunk number threshold value (S820: no), the deduplication processing unit 310 ends this flow. On the other hand, when it is determined that the number of stored chunks is equal to or greater than the separation chunk number threshold value (S820: yes), the deduplication processing unit 310 enables the close flag (S830) and ends this flow.
  • the storage apparatus 10 can suppress a decrease in the deduplication rate due to the container close process described later by setting a container including a plurality of chunks whose positions in the backup data are close to each other as a registered container. . Further, the storage apparatus 10 can keep the number of stored chunks of each container at or above the separation chunk number threshold by using the separation chunk number threshold.
  • FIG. 22 shows the container close process
  • the deduplication processing unit 310 determines whether or not the close flag is valid (S910). When it is determined that the close flag is invalid (S910: no), the deduplication processing unit 310 ends this flow. On the other hand, when it is determined that the close flag is valid (S910: yes), the deduplication processing unit 310 performs a registered container determination process (S920).
  • the de-duplication processing unit 310 determines whether or not the interval evaluation value is smaller than the interval evaluation threshold (S930). When it is determined that the interval evaluation value is greater than or equal to the interval evaluation threshold (S930: no), the deduplication processing unit 310 ends the registered container determination process. On the other hand, when it is determined that the interval evaluation value is smaller than the interval evaluation threshold (S930: yes), the deduplication processing unit 310 determines whether or not the number of stored chunks is equal to or greater than the storage chunk number lower limit (S940). .
  • the deduplication processing unit 310 ends the registered container determination process.
  • the deduplication processing unit 310 registers the chunk information 680 of all the chunks in the work container in the chunk index 540 ( S950), the registered container determination process is terminated.
  • the deduplication processing unit 310 closes the work container (S960), and ends this flow. At this time, the deduplication processing unit 310 instructs the file system management unit 320 to destage the work container. In response to the instruction, the file system management unit 320 writes the work container from the memory 120 to the disk array 60 and invalidates the work container on the memory 120.
  • the storage apparatus 10 can close the work container when the work container satisfies either the separation condition or the storage chunk number condition. Further, the storage apparatus 10 can reduce the number of stages and destages by making a container including new chunks whose positions in the backup data are separated from each other and a container having a small number of stored chunks as non-registered containers. A decrease in performance of the storage apparatus 10 can be suppressed. Further, the storage apparatus 10 can keep the number of stored chunks of each registered container equal to or greater than the lower limit value of the stored chunk number by using the lower limit value of the stored chunk number.
  • FIG. 23 shows the Read process.
  • the deduplication processing unit 310 When the deduplication processing unit 310 receives a backup data read request from the host computer 30, the deduplication processing unit 310 starts read processing.
  • the deduplication processing unit 310 opens the work content that is the content corresponding to the backup data in the processing data 420 (S2110). Thereafter, the deduplication processing unit 310 selects chunks as selected chunks in the order of offset 621 from the work content, and acquires chunk information 620 of the selected chunks (S2120). Thereafter, the deduplication processing unit 310 identifies the container ID 623 of the container including the selected chunk and the fingerprint 624 of the selected chunk, and identifies the offset 642 and length 643 in the container index 520 corresponding to the identified container ID. The specified offset and length chunk data 662 is read from the container 530 corresponding to the specified container ID (S2130). Thereafter, the deduplication processing unit 310 determines whether all chunks in the work content have been read (S2140).
  • the deduplication processing unit 310 shifts the processing to S2120 and selects the next selected chunk. If it is determined that all chunks in the work content have been read (S2140: yes), the deduplication processing unit 310 closes the work content (S2150), and ends this flow.
  • the backup data designated by the host computer 30 can be reconstructed from the processing data 420, and the reconstructed backup data can be transmitted to the host computer 30.
  • the storage device 90, the storage device 10, or the like may be used.
  • the host computer the host computer 80, the host computer 30, or the like may be used.
  • a memory 91, a memory 120, or the like may be used as the memory.
  • the processor the CPU 110 or the like may be used.
  • a disk 92, a disk array 60, or the like may be used.
  • a fingerprint or the like may be used as the chunk information.
  • a chunk index 540 or the like may be used.
  • As the closing condition, a separation condition, a storage chunk number condition, or the like may be used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention réduit le traitement pour remplacer des données dans une mémoire par des données dans un dispositif de stockage et améliore ainsi le fonctionnement de l'unité de stockage. Lorsqu'il est déterminé qu'un segment de travail n'est pas identique à un segment correspondant, un processeur, selon la présente invention, place le segment de travail dans un contenant de travail. Le processeur détermine si le contenant de travail satisfait ou non une condition de fermeture prédéterminée, et, s'il est déterminé que le contenant de travail satisfait la condition de fermeture, détermine si le contenant de travail satisfait ou non une condition d'enregistrement prédéterminée. S'il est déterminé que le contenant de travail satisfait la condition d'enregistrement, le processeur amène des informations de segment, fondées sur chaque segment dans le contenant de travail, à être incluses dans des informations d'index. S'il est déterminé que le contenant de travail satisfait la condition de fermeture, le processeur désactive le contenant de travail au niveau du dispositif de stockage.
PCT/JP2014/080513 2014-11-18 2014-11-18 Unité de stockage, serveur de fichier et procédé de stockage de données WO2016079809A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/080513 WO2016079809A1 (fr) 2014-11-18 2014-11-18 Unité de stockage, serveur de fichier et procédé de stockage de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/080513 WO2016079809A1 (fr) 2014-11-18 2014-11-18 Unité de stockage, serveur de fichier et procédé de stockage de données

Publications (1)

Publication Number Publication Date
WO2016079809A1 true WO2016079809A1 (fr) 2016-05-26

Family

ID=56013424

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/080513 WO2016079809A1 (fr) 2014-11-18 2014-11-18 Unité de stockage, serveur de fichier et procédé de stockage de données

Country Status (1)

Country Link
WO (1) WO2016079809A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100223441A1 (en) * 2007-10-25 2010-09-02 Mark David Lillibridge Storing chunks in containers
WO2014030252A1 (fr) * 2012-08-24 2014-02-27 株式会社日立製作所 Dispositif de mémoire et procédé de gestion de données
WO2014155653A1 (fr) * 2013-03-29 2014-10-02 株式会社日立製作所 Système de détection de duplication de données et procédé de commande de système de détection de duplication de données

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100223441A1 (en) * 2007-10-25 2010-09-02 Mark David Lillibridge Storing chunks in containers
WO2014030252A1 (fr) * 2012-08-24 2014-02-27 株式会社日立製作所 Dispositif de mémoire et procédé de gestion de données
WO2014155653A1 (fr) * 2013-03-29 2014-10-02 株式会社日立製作所 Système de détection de duplication de données et procédé de commande de système de détection de duplication de données

Similar Documents

Publication Publication Date Title
US11068455B2 (en) Mapper tree with super leaf nodes
US11449239B2 (en) Write-ahead log maintenance and recovery
US10747440B2 (en) Storage system and storage system management method
JP6208156B2 (ja) ハイブリッドストレージ集合体の複製
US9639275B2 (en) Managing data within a storage device based on file system metadata
JP6227007B2 (ja) データ圧縮領域へのデータの実時間分類
US11068405B2 (en) Compression of host I/O data in a storage processor of a data storage system with selection of data compression components based on a current fullness level of a persistent cache
US8760956B1 (en) Data processing method and apparatus
US20190138517A1 (en) Hot-Spot Adaptive Garbage Collection
US10108671B2 (en) Information processing device, computer-readable recording medium having stored therein information processing program, and information processing method
US20130174176A1 (en) Workload management in a data storage system
US20190042134A1 (en) Storage control apparatus and deduplication method
US11609849B2 (en) Deduplication system threshold based on a type of storage device
US20210072899A1 (en) Information processing apparatus and computer-readable recording medium recording information processing program
US11429431B2 (en) Information processing system and management device
JP2017049806A (ja) ストレージ制御装置およびストレージ制御プログラム
US20190056878A1 (en) Storage control apparatus and computer-readable recording medium storing program therefor
US9740420B2 (en) Storage system and data management method
JPWO2016038714A1 (ja) ファイルシステム、データ重複排除方法、及びファイルシステムのためのプログラム
WO2016079809A1 (fr) Unité de stockage, serveur de fichier et procédé de stockage de données
US10614036B1 (en) Techniques for de-duplicating data storage systems using a segmented index
US10853257B1 (en) Zero detection within sub-track compression domains
JP6419662B2 (ja) ストレージシステム及びデータ重複検出方法
US10845994B1 (en) Performing reconciliation on a segmented de-duplication index
US20230418798A1 (en) Information processing apparatus and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14906465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14906465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP