US20170206011A1 - Enhancing performance-cost ratio of a primary storage adaptive data reduction system - Google Patents

Enhancing performance-cost ratio of a primary storage adaptive data reduction system Download PDF

Info

Publication number
US20170206011A1
US20170206011A1 US15/479,172 US201715479172A US2017206011A1 US 20170206011 A1 US20170206011 A1 US 20170206011A1 US 201715479172 A US201715479172 A US 201715479172A US 2017206011 A1 US2017206011 A1 US 2017206011A1
Authority
US
United States
Prior art keywords
data
storage system
data reduction
attributes
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/479,172
Inventor
David D. Chambliss
Mihail C. Constantinescu
Joseph S. Glider
Maohua Lu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US15/479,172 priority Critical patent/US20170206011A1/en
Publication of US20170206011A1 publication Critical patent/US20170206011A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • G06F17/30303
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • G06F17/30153
    • G06F17/30156

Definitions

  • the present invention relates generally to data reduction in storage systems, and more particularly to enhancing performance-cost ratio of a primary storage adaptive data reduction system.
  • Storage systems are utilized in information technology environments such as enterprise computing systems. Because information technology system managers are generally interested in increasing data storage efficiency, many modern storage systems provide data reduction for increasing storage efficiency and storage savings. Data reduction techniques are increasingly used to store primary data in less storage space. Examples of such data reduction techniques include compression and deduplication. Data reduction requires large amount of processing resources such as processor cycles (MIPS) and memory bus bandwidth to perform the necessary mathematical transformations on the data, resulting in a higher cost/performance ratio.
  • MIPS processor cycles
  • Embodiments of the present invention provide data reduction for a storage system.
  • One embodiment comprises determining attributes of data for storage in the storage system. Then expected data reduction effectiveness for the data is determined based on said attributes. Said effectiveness indicates the benefit that data reduction is expected to provide for the data based on said attributes. Data reduction is applied to the data based on the expected data reduction effectiveness as weighed against performance impact, to improve resource usage efficiency.
  • applying data reduction comprises selectively applying data reduction to the data based on the data reduction effectiveness and available processing resources.
  • a type of data reduction for the data is determined based on the data attributes. In one implementation, determining a type of data reduction for the data comprises determining a type of data reduction that is most effective in reducing the data based on the data attributes. In one embodiment, selectively applying data reduction comprises, based on the data reduction effectiveness, storing the data without applying data reduction.
  • the present invention provides a storage system comprising a file-write serving module that is invoked in response to a data write, and based on data attributes determines expected data reduction effectiveness for the data based on said attributes.
  • the storage system further comprises an adaptive data reduction module that receives write data, and based on the expected data reduction effectiveness, selectively applies data reduction transformations to the data to achieve best data reduction effectiveness and performance, given limited processing resources.
  • the adaptive data reduction module balances the data reduction effectiveness with processing resource state, to determine amount of data processing resources to allocate for data reduction transformations to the data.
  • FIG. 1 is a block diagram illustrating an example of a network environment for adaptive data reduction in a storage system, according to an embodiment of the present invention
  • FIG. 2 is a block diagram illustrating an example of a server utilizing an adaptive data reduction process, according to an embodiment of the present invention, as shown in FIG. 1 ;
  • FIG. 3 shows a block diagram of components of an adaptive data reduction system for a file system of an information technology environment, according to an embodiment of the invention
  • FIG. 4A shows a block diagram of components of a storage system including an adaptive data reduction system, according to an embodiment of the invention
  • FIG. 4B shows a block diagram of components of a storage system including an adaptive data reduction system, according to another embodiment of the invention.
  • FIG. 4C shows a block diagram of components of a storage system including an adaptive data reduction system, according to another embodiment of the invention.
  • FIG. 5 shows a flowchart illustrating an adaptive data reduction process, according to an embodiment of the present invention.
  • Embodiments of the invention relate to enhancing performance-cost ratio of a primary storage adaptive data reduction system.
  • An embodiment of the invention allows performance enhancement for storage systems by selectivity applying data reduction to data.
  • Data reduction techniques require a large amount of processor (e.g., CPU) resources that might otherwise be used for processing I/O requests, and therefore cause the processor utilization to become a system bottleneck which delays the servicing of other I/Os and lowers the overall throughput.
  • applying data reduction techniques selectively lowers the overall use of processor resources for data reduction per I/O, while ensuring that the system retains data reduction effectiveness by using processor resources for data reduction only on the data that will benefit the most, leaving more processor resources for processing other I/O requests, thereby increasing overall throughput.
  • An embodiment of the invention provides performance enhancement functions at different layers in a storage stack.
  • a first performance enhancement function is based on access to specific data information which may be relevant to data reduction.
  • a second performance enhancement function selectively performs data reduction based on data reduction effectiveness for certain data.
  • the first performance enhancement function collects and summarizes data reduction effectiveness information for a unit of data such as a file, and provides that summary to the second performance enhancement function which based on the supplied information selectively allocates processing resources, if any, for data reduction to increase system throughput for storage efficiency and space-saving.
  • Different attributes (characteristics) of the data are used to determine if the data should be considered for data reduction.
  • the first performance enhancement function enables the second performance enhancement function to optimize data path utilization for storage efficiency.
  • data reduction effectiveness information can be shared, according to how the file-writing and data reduction processing components are organized in the system and the communication paths available to them.
  • a shared memory can be used to maintain a table of hints, wherein each hint includes a data identifier along with a suggestion of what data reductions would be effective and the likely effectiveness of the suggested techniques.
  • such a hint might be communicated as a message that accompanies the data, or as parameters in a command structure used to make an I/O request.
  • One example of such class of files are files with smaller sizes.
  • Smaller files generally account for a very small volume of the used capacity of a storage system.
  • An embodiment of the invention selectively decreases allocation of processing resources for reducing size of smaller files. This is because such file size reduction is unlikely to have a significant effect on data reduction effectiveness (specially where the per kilobyte count of I/Os to smaller files is disproportionately high relative to I/O to larger files).
  • small files e.g., about 8 KB or less
  • Another example of such class of files is pre-compressed or otherwise uncompressible or encrypted files.
  • An embodiment of the invention selectively decreases allocation of processing resources for reducing size of pre-compressed or otherwise uncompressible or encrypted files.
  • Such files are unlikely to be compressible by generic techniques but would use considerable computing resources in an attempt to compress (more MIPS per byte are used than for more compressible data).
  • files known to have bad compression opportunity may be deduplicated but are not compressed.
  • Another example of such class of files is short-lived files.
  • An embodiment of the invention selectively applies data reduction to the data based on longevity of data.
  • An implementation of the invention selectively decreases allocation of processing resources for reducing size of short-lived files.
  • Such files are likely to be rewritten and their blocks re-used, and will not contribute to long-term data reduction effectiveness.
  • files that will likely be short-lived may be compressed if bandwidth reduction to a backend disk layer is desirable, but deduplicating them may be undesirable if expected deduplication potential is low for such short-lived files.
  • One embodiment of the invention comprises an adaptive data reduction system implementing a adaptive data reduction method for selectively allocating processing resources for data reduction for scenarios which result in significant data reduction, thereby maintaining (or improving) overall system performance.
  • FIG. 1 illustrates an example of the basic components of an information technology system 10 utilizing an adaptive data reduction system 100 , used in connection with a preferred embodiment of the present invention.
  • the system 10 includes a storage server 11 and the remote devices 15 and 17 - 20 that may utilize the adaptive data reduction system 100 of the present invention.
  • FIG. 2 Illustrated in FIG. 2 is a block diagram demonstrating an example of the storage server 11 , as shown in FIG. 1 , utilizing an embodiment of the adaptive data reduction system 100 according to an embodiment of the present invention.
  • the adaptive data reduction system 100 utilizes a selective data reduction process for data reduction in a file system, according to an embodiment of the invention.
  • the adaptive data reduction system 100 may be implemented as a standalone system as shown in FIG. 1 .
  • data is selected for data reduction based on the nature (attributes) of the data and therefore the benefit that data reduction is expected to provide for that data, to improve resource usage efficiency as described herein.
  • Said expected benefit is based on expected effectiveness of data reduction processing for the data given the nature of the data.
  • the invention provides an adaptive data reduction system 100 implementing the abovementioned features, to constrain the amount of processing resources (e.g., MIPS and memory bandwidth) expended by a storage system, while continuing to effectively support data reduction or even other high resource usage functions.
  • the adaptive data reduction system 100 is useful with current storage platforms, and will lower the cost/terabyte of many configurations by allowing configurations that have more storage (e.g., disks) attached to a controller subsystem of the storage system.
  • the adaptive data reduction system monitors its own internal state such as processor usage and data paths, to be more selective or less selective about the amount of data reduction it performs. For example, for files that are marginally compressible, the adaptive data reduction system 100 may select to compress such files when processor load is light. The adaptive data reduction system 100 may forego file compression when processor load is heavy.
  • the adaptive data reduction system 100 comprises a detection module 101 that detects when files being written to storage are not likely to make positive impact on data reduction effectiveness, and/or are likely to negatively impact system performance.
  • the adaptive data reduction system 100 further includes a controller module 102 that adapts the data path for servicing of such writes so as to selectively bypass data reduction (e.g., bypass either compression or deduplication, or both).
  • the adaptive data reduction system 100 further includes a reduction process module 103 that performs data reduction as determined by the controller module 102 . Finally, data is stored in the data storage 100 C.
  • the present invention provides a storage system including an adaptive data reduction system 100 comprising a file-write serving module 110 which is invoked when a file write occurs and determines (by checking file characteristics such as file size and file type), whether the file data can benefit from data reduction techniques (e.g., compression, deduplication, etc).
  • a file-write serving module 110 which is invoked when a file write occurs and determines (by checking file characteristics such as file size and file type), whether the file data can benefit from data reduction techniques (e.g., compression, deduplication, etc).
  • the storage system further comprises a data reduction module 111 which receives write data.
  • the data reduction module 111 uses the information from the file-write serving module 110 (indicating whether the file characteristics suggest that the file will benefit from data reduction), to selectively apply data reduction transformations to the write data.
  • the data reduction module 111 simultaneously achieves best data reduction effectiveness and performance for the write data, given limited processing resources by choosing to perform data reduction only on the data that will achieve maximum benefit, while ensuring that the current I/O load has enough processor resources to provide expected performance. For example, the data reduction module 111 balances the information it receives from the file-write serving module 110 with its own internal state (e.g., the processing load on the data reduction module 111 ), to decide on allocation of processing resources for compression and deduplication transformations of the write data. Finally, data is stored in the data storage 100 C.
  • the file-write serving module 110 may be a component of a file server 100 A, while the data reduction module 111 may be a component of a block storage server 100 B.
  • the file-write serving module 110 and the data reduction module 111 and block storage layer 100 B may all be components of a file server 100 A.
  • both the file-write serving module 110 and the data reduction module 111 may be a component of a database server 100 F, etc.
  • the invention operates similarly although the method of communicating the information between the two components 110 and 111 is different for the different storage system arrangements.
  • FIG. 5 is a flowchart of an adaptive data reduction process 50 for the adaptive data reduction system 100 in a storage system, according to an embodiment of the invention.
  • a process block 51 comprises detecting when a file is being written to storage.
  • Process block 52 comprises checking attributes of the file.
  • Process 53 comprises determining expected data reduction effectiveness in saving storage space given performance impact.
  • One implementation comprises determining if based on the attributes, data reduction is likely to make positive impact on data reduction effectiveness, according to information described in previous paragraphs such as file size or by file type whether the file is likely to be uncompressible or short-lived.
  • Process block 54 comprises communicating the data reduction effectiveness determination to a data reduction controller module.
  • a data reduction process 60 of a data reduction controller module data reduction is selectively applied to the data based on the expected data expected reduction effectiveness as weighed against system performance impact.
  • process block 55 comprises determining type of data reduction (if any) for the file based on data reduction effectiveness.
  • process block 56 if data reduction is to be applied based on the determining step in process block 55 , then in process block 57 data reduction is applied to the file, otherwise in process block 58 data reduction is skipped for the file (the file data is stored without data reduction).
  • the process 50 simultaneously achieves best data reduction effectiveness and performance for the write data, given limited processing resources.
  • Process block 55 balances the data reduction effectiveness information from process block 53 with processing load, to decide on allocation of processing resources for compression and deduplication transformations of the write data.
  • each of the remote devices 15 and 17 - 20 has applications and can have a local database 16 .
  • Server 11 contains applications, and a database 12 that may be accessed by remote device 15 and 17 - 20 via connections 14 (A-F), respectively, over network 13 .
  • the server 11 executes software for a computer network and controls access to itself and database 12 .
  • the remote device 15 and 17 - 20 may access the database 12 over a network 13 , such as but not limited to: the Internet, a local area network (LAN), a wide area network (WAN), via a telephone line using a modem (POTS), Bluetooth, WiFi, WiMAX, cellular, optical, satellite, radio frequency (RF), Ethernet, magnetic induction, coax, RS-485, the like or other like networks.
  • the server 11 may also be connected to the LAN within an organization.
  • the remote device 15 and 17 - 20 may each be located at remote sites.
  • Remote device 15 and 17 - 20 include but are not limited to, PCs, workstations, laptops, handheld computers, pocket PCs, PDAs, pagers, wireless application protocol (WAP) devices, non-WAP devices, cell phones, palm devices, printing devices and the like.
  • WAP wireless application protocol
  • the remote device 15 and 17 - 20 communicates over the network 13 , to access the server 11 and database 12 .
  • Third party computer systems 21 and databases 22 can access the server 11 ( FIG. 2 ). Data that is obtained from third party computer systems 21 and database 22 can be stored on server 11 and database 12 in order to provide later access to a user on remote devices 15 and 17 - 20 . It is also contemplated that for certain types of data that the remote devices 15 and 17 - 20 can access the third party computer systems 21 and database 22 directly using the network 13 .
  • the server 11 includes a storage system.
  • the server 11 includes a processor 41 , a computer readable medium such as memory 42 , and one or more input and/or output (I/O) devices (or peripherals) that are communicatively coupled via a local interface 43 .
  • the local interface 43 can be, for example but not limited to, one or more buses or other wired or wireless connections, as is known in the art.
  • the local interface 43 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications. Further, the local interface 43 may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.
  • the processor 41 is a hardware device for executing software that can be stored in memory 42 .
  • the processor 41 can be virtually any custom made or commercially available processor, a central processing unit (CPU), data signal processor (DSP) or an auxiliary processor among several processors associated with the server 11 , and a semiconductor based microprocessor (in the form of a microchip) or a microprocessor.
  • the memory 42 can include any one or combination of volatile memory elements (e.g., random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), etc.) and nonvolatile memory elements (e.g., read only memory (ROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), tape, compact disc read only memory (CD-ROM), disk, diskette, cartridge, cassette or the like, etc.).
  • RAM random access memory
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • nonvolatile memory elements e.g., read only memory (ROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), tape, compact disc read only memory (CD-ROM), disk, diskette, cartridge, cassette or the like, etc.
  • the memory 42 may incorporate electronic, magnetic, optical, and/or
  • the software in memory 42 may include one or more separate programs, each of which comprises an ordered listing of executable instructions for implementing logical functions.
  • the software in the memory 42 includes a suitable operating system (0/S) 51 and the adaptive data reduction system 100 of the present invention.
  • the adaptive data reduction system 100 comprises functional components and process blocks described herein.
  • the operating system 51 essentially controls the execution of other computer programs, such as the adaptive data reduction system 100 , and provides scheduling, input/output control, file and data management, memory management, and communication control and related services.
  • the adaptive data reduction system 100 of the present invention is applicable on all other commercially available operating systems.
  • the adaptive data reduction system 100 may comprise a source program, executable program (object code), script, or any other entity comprising a set of computer program instructions to be performed.
  • the program is usually translated via a compiler, assembler, interpreter, or the like, which may or may not be included within the memory 42 , so as to operate properly in connection with the O/S 51 .
  • the adaptive data reduction system 100 can be written as (a) an object oriented programming language, which has classes of data and methods, or (b) a procedure programming language, which has routines, subroutines, and/or functions.
  • the computer program instructions may execute entirely on server 11 , partly on the server 11 , as a stand-alone software package, partly on server 11 and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • the I/O devices may include input devices, for example but not limited to, a mouse 44 , keyboard 45 , scanner (not shown), microphone (not shown), etc. Furthermore, the I/O devices may also include output devices, for example but not limited to, a printer (not shown), display 46 , etc. Finally, the I/O devices may further include devices that communicate both inputs and outputs, for instance but not limited to, a network interface card (NIC) or modulator/demodulator 47 (for accessing remote devices, other files, devices, systems, or a network), a RF or other transceiver (not shown), a telephonic interface (not shown), a bridge (not shown), a router (not shown), etc.
  • NIC network interface card
  • modulator/demodulator 47 for accessing remote devices, other files, devices, systems, or a network
  • RF or other transceiver not shown
  • telephonic interface not shown
  • bridge not shown
  • router not shown
  • the software in the memory 42 may further include a basic input output system (BIOS) (omitted for simplicity).
  • BIOS is a set of essential software routines that initialize and test hardware at startup, start the O/S 51 , and support the transfer of data among the hardware devices.
  • the BIOS is stored in some type of read-only-memory, such as ROM, PROM, EPROM, EEPROM or the like, so that the BIOS can be executed when the server 11 is activated.
  • the processor 41 When the server 11 is in operation, the processor 41 is configured to execute software stored within the memory 42 , to communicate data to and from the memory 42 , and generally to control operations of the server 11 pursuant to the software.
  • the adaptive data reduction system 100 and the O/S 51 are read, in whole or in part, by the processor 41 , perhaps buffered within the processor 41 , and then executed.
  • the adaptive data reduction system 100 can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
  • aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, propagation medium, or other physical device or means that can contain or store a computer program for use by or in connection with a computer related system or method.
  • the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic or optical), a RAM (electronic), a ROM (electronic), an EPROM, EEPROM, or Flash memory (electronic), an optical fiber (optical), and a CDROM, CD R/W) (optical).
  • the computer-readable medium could even be paper or another suitable medium, upon which the program is printed or punched (as in paper tape, punched cards, etc.), as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • the adaptive data reduction system 100 can be implemented with any one or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
  • ASIC application specific integrated circuit
  • PGA programmable gate array
  • FPGA field programmable gate array
  • the remote devices 15 and 17 - 20 provide access to the adaptive data reduction system 100 of the present invention on server 11 and database 12 using for example, but not limited to an Internet browser.
  • the information accessed in server 11 and database 12 can be provided in a number of different forms including but not limited to ASCII data, WEB page data (i.e., HTML), XML or other type of formatted data.
  • the remote device 15 and 17 - 20 are similar to the description of the components for server 11 described with regard to FIG. 2 .
  • the remote devices 15 and 17 - 20 may be referred to as remote devices 15 for the sake of brevity.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Data reduction in a storage system comprises determining attributes of data for storage in the storage system and determining expected data reduction effectiveness for the data based on said attributes. Said effectiveness indicates the benefit that data reduction is expected to provide for the data based on said attributes. The data reduction further comprises applying data reduction to the data based on the expected data reduction effectiveness and performance impact, to improve resource usage efficiency.

Description

    BACKGROUND
  • The present invention relates generally to data reduction in storage systems, and more particularly to enhancing performance-cost ratio of a primary storage adaptive data reduction system.
  • Storage systems are utilized in information technology environments such as enterprise computing systems. Because information technology system managers are generally interested in increasing data storage efficiency, many modern storage systems provide data reduction for increasing storage efficiency and storage savings. Data reduction techniques are increasingly used to store primary data in less storage space. Examples of such data reduction techniques include compression and deduplication. Data reduction requires large amount of processing resources such as processor cycles (MIPS) and memory bus bandwidth to perform the necessary mathematical transformations on the data, resulting in a higher cost/performance ratio.
  • SUMMARY
  • Embodiments of the present invention provide data reduction for a storage system. One embodiment comprises determining attributes of data for storage in the storage system. Then expected data reduction effectiveness for the data is determined based on said attributes. Said effectiveness indicates the benefit that data reduction is expected to provide for the data based on said attributes. Data reduction is applied to the data based on the expected data reduction effectiveness as weighed against performance impact, to improve resource usage efficiency.
  • In one embodiment, applying data reduction comprises selectively applying data reduction to the data based on the data reduction effectiveness and available processing resources.
  • In one embodiment, a type of data reduction for the data is determined based on the data attributes. In one implementation, determining a type of data reduction for the data comprises determining a type of data reduction that is most effective in reducing the data based on the data attributes. In one embodiment, selectively applying data reduction comprises, based on the data reduction effectiveness, storing the data without applying data reduction.
  • In another embodiment, the present invention provides a storage system comprising a file-write serving module that is invoked in response to a data write, and based on data attributes determines expected data reduction effectiveness for the data based on said attributes. The storage system further comprises an adaptive data reduction module that receives write data, and based on the expected data reduction effectiveness, selectively applies data reduction transformations to the data to achieve best data reduction effectiveness and performance, given limited processing resources.
  • In one embodiment, the adaptive data reduction module balances the data reduction effectiveness with processing resource state, to determine amount of data processing resources to allocate for data reduction transformations to the data.
  • These and other aspects, features and advantages of the invention will be understood with reference to the drawing figures, and detailed description herein, and will be realized by means of the various elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following brief description of the drawings and detailed description of the invention are exemplary and explanatory of preferred embodiments of the invention, and are not restrictive of the invention, as claimed.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
  • FIG. 1 is a block diagram illustrating an example of a network environment for adaptive data reduction in a storage system, according to an embodiment of the present invention;
  • FIG. 2 is a block diagram illustrating an example of a server utilizing an adaptive data reduction process, according to an embodiment of the present invention, as shown in FIG. 1;
  • FIG. 3 shows a block diagram of components of an adaptive data reduction system for a file system of an information technology environment, according to an embodiment of the invention;
  • FIG. 4A shows a block diagram of components of a storage system including an adaptive data reduction system, according to an embodiment of the invention;
  • FIG. 4B shows a block diagram of components of a storage system including an adaptive data reduction system, according to another embodiment of the invention;
  • FIG. 4C shows a block diagram of components of a storage system including an adaptive data reduction system, according to another embodiment of the invention; and
  • FIG. 5 shows a flowchart illustrating an adaptive data reduction process, according to an embodiment of the present invention.
  • The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
  • DETAILED DESCRIPTION
  • The present invention may be understood more readily by reference to the following detailed description of the invention taken in connection with the accompanying drawing figures, which form a part of this disclosure. It is to be understood that this invention is not limited to the specific devices, methods, conditions or parameters described and/or shown herein, and that the terminology used herein is for the purpose of describing particular embodiments by way of example only and is not intended to be limiting of the claimed invention.
  • Embodiments of the invention relate to enhancing performance-cost ratio of a primary storage adaptive data reduction system. An embodiment of the invention allows performance enhancement for storage systems by selectivity applying data reduction to data. Data reduction techniques require a large amount of processor (e.g., CPU) resources that might otherwise be used for processing I/O requests, and therefore cause the processor utilization to become a system bottleneck which delays the servicing of other I/Os and lowers the overall throughput. According to embodiments of the invention, applying data reduction techniques selectively lowers the overall use of processor resources for data reduction per I/O, while ensuring that the system retains data reduction effectiveness by using processor resources for data reduction only on the data that will benefit the most, leaving more processor resources for processing other I/O requests, thereby increasing overall throughput.
  • An embodiment of the invention provides performance enhancement functions at different layers in a storage stack. A first performance enhancement function is based on access to specific data information which may be relevant to data reduction. A second performance enhancement function selectively performs data reduction based on data reduction effectiveness for certain data.
  • In one implementation, the first performance enhancement function collects and summarizes data reduction effectiveness information for a unit of data such as a file, and provides that summary to the second performance enhancement function which based on the supplied information selectively allocates processing resources, if any, for data reduction to increase system throughput for storage efficiency and space-saving.
  • Different attributes (characteristics) of the data (e.g., size, type, etc.) are used to determine if the data should be considered for data reduction. As such, the first performance enhancement function enables the second performance enhancement function to optimize data path utilization for storage efficiency. There are many ways that data reduction effectiveness information can be shared, according to how the file-writing and data reduction processing components are organized in the system and the communication paths available to them. In one embodiment, a shared memory can be used to maintain a table of hints, wherein each hint includes a data identifier along with a suggestion of what data reductions would be effective and the likely effectiveness of the suggested techniques. In another embodiment, such a hint might be communicated as a message that accompanies the data, or as parameters in a command structure used to make an I/O request.
  • According to embodiments of the invention, there are a number of classes of files which are not likely to have a positive impact on data reduction effectiveness but may have a negative impact on performance when processing resources are applied to reduce their sizes.
  • One example of such class of files are files with smaller sizes. Smaller files generally account for a very small volume of the used capacity of a storage system. An embodiment of the invention selectively decreases allocation of processing resources for reducing size of smaller files. This is because such file size reduction is unlikely to have a significant effect on data reduction effectiveness (specially where the per kilobyte count of I/Os to smaller files is disproportionately high relative to I/O to larger files). In one example, small files (e.g., about 8 KB or less) are not deduplicated because they would result in deduplication directory pages to be cached ineffectively. It may still be useful to compress such files.
  • Another example of such class of files is pre-compressed or otherwise uncompressible or encrypted files. An embodiment of the invention selectively decreases allocation of processing resources for reducing size of pre-compressed or otherwise uncompressible or encrypted files. Such files are unlikely to be compressible by generic techniques but would use considerable computing resources in an attempt to compress (more MIPS per byte are used than for more compressible data). In one example, files known to have bad compression opportunity may be deduplicated but are not compressed.
  • Another example of such class of files is short-lived files. An embodiment of the invention selectively applies data reduction to the data based on longevity of data. An implementation of the invention selectively decreases allocation of processing resources for reducing size of short-lived files. Such files are likely to be rewritten and their blocks re-used, and will not contribute to long-term data reduction effectiveness.
  • As such, use of computing resources on reducing sizes of short-lived files may be wasted. In one example, files that will likely be short-lived may be compressed if bandwidth reduction to a backend disk layer is desirable, but deduplicating them may be undesirable if expected deduplication potential is low for such short-lived files.
  • One embodiment of the invention comprises an adaptive data reduction system implementing a adaptive data reduction method for selectively allocating processing resources for data reduction for scenarios which result in significant data reduction, thereby maintaining (or improving) overall system performance.
  • Referring now to the drawings, FIG. 1 illustrates an example of the basic components of an information technology system 10 utilizing an adaptive data reduction system 100, used in connection with a preferred embodiment of the present invention. The system 10 includes a storage server 11 and the remote devices 15 and 17-20 that may utilize the adaptive data reduction system 100 of the present invention.
  • Illustrated in FIG. 2 is a block diagram demonstrating an example of the storage server 11, as shown in FIG. 1, utilizing an embodiment of the adaptive data reduction system 100 according to an embodiment of the present invention. The adaptive data reduction system 100 utilizes a selective data reduction process for data reduction in a file system, according to an embodiment of the invention. In another embodiment, the adaptive data reduction system 100 may be implemented as a standalone system as shown in FIG. 1.
  • According to embodiments of the invention, data is selected for data reduction based on the nature (attributes) of the data and therefore the benefit that data reduction is expected to provide for that data, to improve resource usage efficiency as described herein. Said expected benefit is based on expected effectiveness of data reduction processing for the data given the nature of the data.
  • In one embodiment, the invention provides an adaptive data reduction system 100 implementing the abovementioned features, to constrain the amount of processing resources (e.g., MIPS and memory bandwidth) expended by a storage system, while continuing to effectively support data reduction or even other high resource usage functions. In one embodiment, the adaptive data reduction system 100 is useful with current storage platforms, and will lower the cost/terabyte of many configurations by allowing configurations that have more storage (e.g., disks) attached to a controller subsystem of the storage system.
  • In one embodiment, the adaptive data reduction system monitors its own internal state such as processor usage and data paths, to be more selective or less selective about the amount of data reduction it performs. For example, for files that are marginally compressible, the adaptive data reduction system 100 may select to compress such files when processor load is light. The adaptive data reduction system 100 may forego file compression when processor load is heavy.
  • Referring to FIG. 3, in one implementation, the adaptive data reduction system 100 comprises a detection module 101 that detects when files being written to storage are not likely to make positive impact on data reduction effectiveness, and/or are likely to negatively impact system performance.
  • The adaptive data reduction system 100 further includes a controller module 102 that adapts the data path for servicing of such writes so as to selectively bypass data reduction (e.g., bypass either compression or deduplication, or both). The adaptive data reduction system 100 further includes a reduction process module 103 that performs data reduction as determined by the controller module 102. Finally, data is stored in the data storage 100C.
  • Referring to FIG. 4A, in another implementation, the present invention provides a storage system including an adaptive data reduction system 100 comprising a file-write serving module 110 which is invoked when a file write occurs and determines (by checking file characteristics such as file size and file type), whether the file data can benefit from data reduction techniques (e.g., compression, deduplication, etc).
  • The storage system further comprises a data reduction module 111 which receives write data. The data reduction module 111 uses the information from the file-write serving module 110 (indicating whether the file characteristics suggest that the file will benefit from data reduction), to selectively apply data reduction transformations to the write data.
  • In one implementation, the data reduction module 111 simultaneously achieves best data reduction effectiveness and performance for the write data, given limited processing resources by choosing to perform data reduction only on the data that will achieve maximum benefit, while ensuring that the current I/O load has enough processor resources to provide expected performance. For example, the data reduction module 111 balances the information it receives from the file-write serving module 110 with its own internal state (e.g., the processing load on the data reduction module 111), to decide on allocation of processing resources for compression and deduplication transformations of the write data. Finally, data is stored in the data storage 100C.
  • As shown in FIG. 4A, in one embodiment, the file-write serving module 110 may be a component of a file server 100A, while the data reduction module 111 may be a component of a block storage server 100B. In an alternative embodiment, shown in FIG. 4B, the file-write serving module 110 and the data reduction module 111 and block storage layer 100B may all be components of a file server 100A. In an alternative embodiment, shown in FIG. 4C, both the file-write serving module 110 and the data reduction module 111 may be a component of a database server 100F, etc. In said alternative embodiments, the invention operates similarly although the method of communicating the information between the two components 110 and 111 is different for the different storage system arrangements.
  • FIG. 5 is a flowchart of an adaptive data reduction process 50 for the adaptive data reduction system 100 in a storage system, according to an embodiment of the invention. In a detection process 59, a process block 51 comprises detecting when a file is being written to storage. Process block 52 comprises checking attributes of the file. Process 53 comprises determining expected data reduction effectiveness in saving storage space given performance impact. One implementation comprises determining if based on the attributes, data reduction is likely to make positive impact on data reduction effectiveness, according to information described in previous paragraphs such as file size or by file type whether the file is likely to be uncompressible or short-lived.
  • Process block 54 comprises communicating the data reduction effectiveness determination to a data reduction controller module. In a data reduction process 60 of a data reduction controller module, data reduction is selectively applied to the data based on the expected data expected reduction effectiveness as weighed against system performance impact. Specifically, process block 55 comprises determining type of data reduction (if any) for the file based on data reduction effectiveness. According to process block 56, if data reduction is to be applied based on the determining step in process block 55, then in process block 57 data reduction is applied to the file, otherwise in process block 58 data reduction is skipped for the file (the file data is stored without data reduction).
  • In one embodiment, the process 50 simultaneously achieves best data reduction effectiveness and performance for the write data, given limited processing resources. Process block 55 balances the data reduction effectiveness information from process block 53 with processing load, to decide on allocation of processing resources for compression and deduplication transformations of the write data.
  • In the system 10 of FIG. 1, each of the remote devices 15 and 17-20 has applications and can have a local database 16. Server 11 contains applications, and a database 12 that may be accessed by remote device 15 and 17-20 via connections 14(A-F), respectively, over network 13. In one implementation, the server 11 executes software for a computer network and controls access to itself and database 12.
  • The remote device 15 and 17-20 may access the database 12 over a network 13, such as but not limited to: the Internet, a local area network (LAN), a wide area network (WAN), via a telephone line using a modem (POTS), Bluetooth, WiFi, WiMAX, cellular, optical, satellite, radio frequency (RF), Ethernet, magnetic induction, coax, RS-485, the like or other like networks. The server 11 may also be connected to the LAN within an organization.
  • The remote device 15 and 17-20 may each be located at remote sites. Remote device 15 and 17-20 include but are not limited to, PCs, workstations, laptops, handheld computers, pocket PCs, PDAs, pagers, wireless application protocol (WAP) devices, non-WAP devices, cell phones, palm devices, printing devices and the like. When a user at one of the remote devices 15 and 17-20 desires to access data from the database 12 at the server 11, the remote device 15 and 17-20 communicates over the network 13, to access the server 11 and database 12.
  • Third party computer systems 21 and databases 22 can access the server 11 (FIG. 2). Data that is obtained from third party computer systems 21 and database 22 can be stored on server 11 and database 12 in order to provide later access to a user on remote devices 15 and 17-20. It is also contemplated that for certain types of data that the remote devices 15 and 17-20 can access the third party computer systems 21 and database 22 directly using the network 13.
  • In one embodiment the server 11 includes a storage system. Generally, in terms of hardware architecture, as shown in FIG. 2, the server 11 includes a processor 41, a computer readable medium such as memory 42, and one or more input and/or output (I/O) devices (or peripherals) that are communicatively coupled via a local interface 43. The local interface 43 can be, for example but not limited to, one or more buses or other wired or wireless connections, as is known in the art. The local interface 43 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications. Further, the local interface 43 may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.
  • The processor 41 is a hardware device for executing software that can be stored in memory 42. The processor 41 can be virtually any custom made or commercially available processor, a central processing unit (CPU), data signal processor (DSP) or an auxiliary processor among several processors associated with the server 11, and a semiconductor based microprocessor (in the form of a microchip) or a microprocessor.
  • The memory 42 can include any one or combination of volatile memory elements (e.g., random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), etc.) and nonvolatile memory elements (e.g., read only memory (ROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), tape, compact disc read only memory (CD-ROM), disk, diskette, cartridge, cassette or the like, etc.). Moreover, the memory 42 may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory 42 can have a distributed architecture, where various components are situated remote from one another, but can be accessed by the processor 41.
  • The software in memory 42 may include one or more separate programs, each of which comprises an ordered listing of executable instructions for implementing logical functions. In the example illustrated in FIG. 2, the software in the memory 42 includes a suitable operating system (0/S) 51 and the adaptive data reduction system 100 of the present invention. The adaptive data reduction system 100 comprises functional components and process blocks described herein.
  • The operating system 51 essentially controls the execution of other computer programs, such as the adaptive data reduction system 100, and provides scheduling, input/output control, file and data management, memory management, and communication control and related services. However, the adaptive data reduction system 100 of the present invention is applicable on all other commercially available operating systems.
  • The adaptive data reduction system 100 may comprise a source program, executable program (object code), script, or any other entity comprising a set of computer program instructions to be performed. When the adaptive data reduction system 100 is a source program, then the program is usually translated via a compiler, assembler, interpreter, or the like, which may or may not be included within the memory 42, so as to operate properly in connection with the O/S 51. Furthermore, the adaptive data reduction system 100 can be written as (a) an object oriented programming language, which has classes of data and methods, or (b) a procedure programming language, which has routines, subroutines, and/or functions. The computer program instructions may execute entirely on server 11, partly on the server 11, as a stand-alone software package, partly on server 11 and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The I/O devices may include input devices, for example but not limited to, a mouse 44, keyboard 45, scanner (not shown), microphone (not shown), etc. Furthermore, the I/O devices may also include output devices, for example but not limited to, a printer (not shown), display 46, etc. Finally, the I/O devices may further include devices that communicate both inputs and outputs, for instance but not limited to, a network interface card (NIC) or modulator/demodulator 47 (for accessing remote devices, other files, devices, systems, or a network), a RF or other transceiver (not shown), a telephonic interface (not shown), a bridge (not shown), a router (not shown), etc.
  • If the server 11 is a PC, workstation, intelligent device or the like, the software in the memory 42 may further include a basic input output system (BIOS) (omitted for simplicity). The BIOS is a set of essential software routines that initialize and test hardware at startup, start the O/S 51, and support the transfer of data among the hardware devices. The BIOS is stored in some type of read-only-memory, such as ROM, PROM, EPROM, EEPROM or the like, so that the BIOS can be executed when the server 11 is activated.
  • When the server 11 is in operation, the processor 41 is configured to execute software stored within the memory 42, to communicate data to and from the memory 42, and generally to control operations of the server 11 pursuant to the software. The adaptive data reduction system 100 and the O/S 51 are read, in whole or in part, by the processor 41, perhaps buffered within the processor 41, and then executed.
  • When the adaptive data reduction system 100 is implemented in software, as is shown in FIG. 2, it should be noted that the adaptive data reduction system 100 can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
  • As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • In the context of this document, a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, propagation medium, or other physical device or means that can contain or store a computer program for use by or in connection with a computer related system or method.
  • More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic or optical), a RAM (electronic), a ROM (electronic), an EPROM, EEPROM, or Flash memory (electronic), an optical fiber (optical), and a CDROM, CD R/W) (optical). Note that the computer-readable medium could even be paper or another suitable medium, upon which the program is printed or punched (as in paper tape, punched cards, etc.), as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
  • A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • In an alternative embodiment, where the adaptive data reduction system 100 is implemented in hardware, the adaptive data reduction system 100 can be implemented with any one or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
  • The remote devices 15 and 17-20 provide access to the adaptive data reduction system 100 of the present invention on server 11 and database 12 using for example, but not limited to an Internet browser. The information accessed in server 11 and database 12 can be provided in a number of different forms including but not limited to ASCII data, WEB page data (i.e., HTML), XML or other type of formatted data.
  • As illustrated, the remote device 15 and 17-20 are similar to the description of the components for server 11 described with regard to FIG. 2. The remote devices 15 and 17-20 may be referred to as remote devices 15 for the sake of brevity.
  • Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
  • The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention.
  • In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • It should be emphasized that the above-described embodiments of the present invention, particularly, any “preferred” embodiments, are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the invention.
  • Many variations and modifications may be made to the above-described embodiment(s) of the invention without departing substantially from the spirit and principles of the invention. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present invention and protected by the following claims.

Claims (20)

What is claimed is:
1. A method comprising:
determining attributes of data for storage in a storage system; and
determining a storage system configuration for the storage system based on the attributes of the data, wherein the storage system configuration reduces an amount of processing resources expended by the storage system while the storage system supports a usage function.
2. The method of claim 1, wherein the usage function comprises data reduction.
3. The method of claim 2, wherein determining a storage system configuration based on the attributes of the data comprises determining expected effectiveness of data reduction for the data based on the attributes of the data.
4. The method of claim 3, wherein determining expected effectiveness of data reduction for the data based on the attributes of the data comprises:
monitoring an internal state of the storage system, wherein the internal state is indicative of processor usage and data paths of the storage system; and
determining an amount of data reduction to perform based on the internal state of the storage system.
5. The method of claim 1, wherein the processing resources expended include memory bandwidth.
6. The method of claim 5, wherein the storage system configuration has increased storage attached to a controller subsystem of the storage system.
7. The method of claim 1, wherein the attributes of the data comprise at least one of a size attribute of the data or a longevity attribute of the data.
8. A system comprising a computer processor, a computer-readable hardware storage device, and program code embodied with the computer-readable hardware storage device for execution by the computer processor to implement a method comprising:
determining attributes of data for storage in a storage system; and
determining a storage system configuration for the storage system based on the attributes of the data, wherein the storage system configuration reduces an amount of processing resources expended by the storage system while the storage system supports a usage function.
9. The system of claim 8, wherein the usage function comprises data reduction.
10. The system of claim 9, wherein determining a storage system configuration based on the attributes of the data comprises determining expected effectiveness of data reduction for the data based on the attributes of the data.
11. The system of claim 10, wherein determining expected effectiveness of data reduction for the data based on the attributes of the data comprises:
monitoring an internal state of the storage system, wherein the internal state is indicative of processor usage and data paths of the storage system; and
determining an amount of data reduction to perform based on the internal state of the storage system.
12. The system of claim 8, wherein the processing resources expended include memory bandwidth.
13. The system of claim 12, wherein the storage system configuration has increased storage attached to a controller subsystem of the storage system.
14. The system of claim 8, wherein the attributes of the data comprise at least one of a size attribute of the data or a longevity attribute of the data.
15. A computer program product comprising a computer-readable hardware storage device having program code embodied therewith, the program code being executable by a computer to implement a method comprising:
determining attributes of data for storage in a storage system; and
determining a storage system configuration for the storage system based on the attributes of the data, wherein the storage system configuration reduces an amount of processing resources expended by the storage system while the storage system supports a usage function.
16. The computer program product of claim 15, wherein the usage function comprises data reduction.
17. The computer program product of claim 16, wherein determining a storage system configuration based on the attributes of the data comprises determining expected effectiveness of data reduction for the data based on the attributes of the data.
18. The computer program product of claim 17, wherein determining expected effectiveness of data reduction for the data based on the attributes of the data comprises:
monitoring an internal state of the storage system, wherein the internal state is indicative of processor usage and data paths of the storage system; and
determining an amount of data reduction to perform based on the internal state of the storage system.
19. The computer program product of claim 15, wherein the processing resources expended include memory bandwidth.
20. The computer program product of claim 19, wherein the storage system configuration has increased storage attached to a controller subsystem of the storage system.
US15/479,172 2012-04-30 2017-04-04 Enhancing performance-cost ratio of a primary storage adaptive data reduction system Abandoned US20170206011A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/479,172 US20170206011A1 (en) 2012-04-30 2017-04-04 Enhancing performance-cost ratio of a primary storage adaptive data reduction system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/460,611 US9659060B2 (en) 2012-04-30 2012-04-30 Enhancing performance-cost ratio of a primary storage adaptive data reduction system
US15/479,172 US20170206011A1 (en) 2012-04-30 2017-04-04 Enhancing performance-cost ratio of a primary storage adaptive data reduction system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/460,611 Continuation US9659060B2 (en) 2012-04-30 2012-04-30 Enhancing performance-cost ratio of a primary storage adaptive data reduction system

Publications (1)

Publication Number Publication Date
US20170206011A1 true US20170206011A1 (en) 2017-07-20

Family

ID=49478230

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/460,611 Expired - Fee Related US9659060B2 (en) 2012-04-30 2012-04-30 Enhancing performance-cost ratio of a primary storage adaptive data reduction system
US15/479,172 Abandoned US20170206011A1 (en) 2012-04-30 2017-04-04 Enhancing performance-cost ratio of a primary storage adaptive data reduction system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/460,611 Expired - Fee Related US9659060B2 (en) 2012-04-30 2012-04-30 Enhancing performance-cost ratio of a primary storage adaptive data reduction system

Country Status (1)

Country Link
US (2) US9659060B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262393B2 (en) * 2016-12-29 2019-04-16 Intel Corporation Multi-sample anti-aliasing (MSAA) memory bandwidth reduction for sparse sample per pixel utilization

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177028B2 (en) 2012-04-30 2015-11-03 International Business Machines Corporation Deduplicating storage with enhanced frequent-block detection
WO2016048325A1 (en) * 2014-09-25 2016-03-31 Hewlett Packard Enterprise Development Lp Storage space allocation
US9916320B2 (en) 2015-04-26 2018-03-13 International Business Machines Corporation Compression-based filtering for deduplication
KR102345517B1 (en) * 2020-05-06 2021-12-30 인하대학교 산학협력단 Deduplication adapted casedb for edge computing
CN114780501A (en) * 2021-01-22 2022-07-22 伊姆西Ip控股有限责任公司 Data processing method, electronic device and computer program product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030044076A1 (en) * 2001-08-24 2003-03-06 International Business Machines Corporation Managing image storage size
US20070104118A1 (en) * 2005-11-09 2007-05-10 International Business Machines Corporation Determining, transmitting, and receiving performance information with respect to an operation performed locally and at remote nodes
US20090100195A1 (en) * 2007-10-11 2009-04-16 Barsness Eric L Methods and Apparatus for Autonomic Compression Level Selection for Backup Environments
US20110225130A1 (en) * 2010-03-12 2011-09-15 Fujitsu Limited Storage device, and program and method for controlling storage device
US20110238914A1 (en) * 2009-02-25 2011-09-29 Hiroshi Hirayama Storage apparatus and data processing method for the same
US20120079325A1 (en) * 2010-09-29 2012-03-29 Sepaton, Inc. System Health Monitor
US8856797B1 (en) * 2011-10-05 2014-10-07 Amazon Technologies, Inc. Reactive auto-scaling of capacity

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4755642B2 (en) * 2004-04-26 2011-08-24 ストアウィズ インク Method and system for file compression and operation of compressed files for storage
US8335760B1 (en) * 2005-02-07 2012-12-18 Hewlett-Packard Development, L. P. Grid computing system to manage utility service content
US7539661B2 (en) 2005-06-02 2009-05-26 Delphi Technologies, Inc. Table look-up method with adaptive hashing
US8412682B2 (en) 2006-06-29 2013-04-02 Netapp, Inc. System and method for retrieving and using block fingerprints for data deduplication
US7962452B2 (en) 2007-12-28 2011-06-14 International Business Machines Corporation Data deduplication by separating data from meta data
US8266114B2 (en) 2008-09-22 2012-09-11 Riverbed Technology, Inc. Log structured content addressable deduplicating storage
US8392791B2 (en) 2008-08-08 2013-03-05 George Saliba Unified data protection and data de-duplication in a storage system
US10642794B2 (en) 2008-09-11 2020-05-05 Vmware, Inc. Computer storage deduplication
US20100088296A1 (en) 2008-10-03 2010-04-08 Netapp, Inc. System and method for organizing data to facilitate data deduplication
WO2010045262A1 (en) 2008-10-14 2010-04-22 Wanova Technologies, Ltd. Storage-network de-duplication
US8725946B2 (en) 2009-03-23 2014-05-13 Ocz Storage Solutions, Inc. Mass storage system and method of using hard disk, solid-state media, PCIe edge connector, and raid controller
US8140491B2 (en) 2009-03-26 2012-03-20 International Business Machines Corporation Storage management through adaptive deduplication
JP5183598B2 (en) 2009-03-30 2013-04-17 富士フイルム株式会社 Planographic printing plate precursor
US20100274772A1 (en) 2009-04-23 2010-10-28 Allen Samuels Compressed data objects referenced via address references and compression references
US9141621B2 (en) 2009-04-30 2015-09-22 Hewlett-Packard Development Company, L.P. Copying a differential data store into temporary storage media in response to a request
US8204867B2 (en) 2009-07-29 2012-06-19 International Business Machines Corporation Apparatus, system, and method for enhanced block-level deduplication
JP5595701B2 (en) 2009-09-16 2014-09-24 株式会社日立製作所 File management method and storage system
JP5427533B2 (en) 2009-09-30 2014-02-26 株式会社日立製作所 Method and system for transferring duplicate file in hierarchical storage management system
US20110093439A1 (en) 2009-10-16 2011-04-21 Fanglu Guo De-duplication Storage System with Multiple Indices for Efficient File Storage
AU2010200866B1 (en) 2010-03-08 2010-09-23 Quantum Corporation Data reduction indexing
US8396873B2 (en) 2010-03-10 2013-03-12 Emc Corporation Index searching using a bloom filter
US8442942B2 (en) 2010-03-25 2013-05-14 Andrew C. Leppard Combining hash-based duplication with sub-block differencing to deduplicate data
US8397080B2 (en) 2010-07-29 2013-03-12 Industrial Technology Research Institute Scalable segment-based data de-duplication system and method for incremental backups
CN102024032A (en) 2010-11-29 2011-04-20 广州明朝网络科技有限公司 Distributed data caching and persisting method and system based on Erlang
US8392384B1 (en) 2010-12-10 2013-03-05 Symantec Corporation Method and system of deduplication-based fingerprint index caching
US8898119B2 (en) 2010-12-15 2014-11-25 Netapp, Inc. Fingerprints datastore and stale fingerprint removal in de-duplication environments
US8380681B2 (en) * 2010-12-16 2013-02-19 Microsoft Corporation Extensible pipeline for data deduplication
US9116909B2 (en) 2010-12-29 2015-08-25 Amazon Technologies, Inc. Reduced bandwidth data uploading in data systems
CN102156736A (en) 2011-04-12 2011-08-17 上海电通信息服务有限公司 Method for transmitting data between SAP (Systems Application) system and SQL (Structured Query Language) database
US9292530B2 (en) 2011-06-14 2016-03-22 Netapp, Inc. Object-level identification of duplicate data in a storage system
US8589640B2 (en) 2011-10-14 2013-11-19 Pure Storage, Inc. Method for maintaining multiple fingerprint tables in a deduplicating storage system
US8930307B2 (en) 2011-09-30 2015-01-06 Pure Storage, Inc. Method for removing duplicate data from a storage array
US9048862B2 (en) * 2012-04-11 2015-06-02 Netapp, Inc. Systems and methods for selecting data compression for storage data in a storage system
US9177028B2 (en) 2012-04-30 2015-11-03 International Business Machines Corporation Deduplicating storage with enhanced frequent-block detection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030044076A1 (en) * 2001-08-24 2003-03-06 International Business Machines Corporation Managing image storage size
US20070104118A1 (en) * 2005-11-09 2007-05-10 International Business Machines Corporation Determining, transmitting, and receiving performance information with respect to an operation performed locally and at remote nodes
US20090100195A1 (en) * 2007-10-11 2009-04-16 Barsness Eric L Methods and Apparatus for Autonomic Compression Level Selection for Backup Environments
US20110238914A1 (en) * 2009-02-25 2011-09-29 Hiroshi Hirayama Storage apparatus and data processing method for the same
US20110225130A1 (en) * 2010-03-12 2011-09-15 Fujitsu Limited Storage device, and program and method for controlling storage device
US20120079325A1 (en) * 2010-09-29 2012-03-29 Sepaton, Inc. System Health Monitor
US8856797B1 (en) * 2011-10-05 2014-10-07 Amazon Technologies, Inc. Reactive auto-scaling of capacity

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262393B2 (en) * 2016-12-29 2019-04-16 Intel Corporation Multi-sample anti-aliasing (MSAA) memory bandwidth reduction for sparse sample per pixel utilization

Also Published As

Publication number Publication date
US9659060B2 (en) 2017-05-23
US20130290276A1 (en) 2013-10-31

Similar Documents

Publication Publication Date Title
US20170206011A1 (en) Enhancing performance-cost ratio of a primary storage adaptive data reduction system
CN110663019B (en) File system for Shingled Magnetic Recording (SMR)
US10275348B2 (en) Memory controller for requesting memory spaces and resources
US10496548B2 (en) Method and system for user-space storage I/O stack with user-space flash translation layer
CN110865888B (en) Resource loading method and device, server and storage medium
US8615499B2 (en) Estimating data reduction in storage systems
US9110806B2 (en) Opportunistic page caching for virtualized servers
KR101496325B1 (en) Method and apparatus for save/restore state of virtual machine
US10209922B2 (en) Communication via a memory interface
CN110196681B (en) Disk data write-in control method and device for business write operation and electronic equipment
US20130111103A1 (en) High-speed synchronous writes to persistent storage
US20110202918A1 (en) Virtualization apparatus for providing a transactional input/output interface
US10831618B1 (en) Method and apparatus for mounting and unmounting a stable snapshot copy of a user file system
CN115413338A (en) Providing direct data access between an accelerator and a storage device in a computing environment
US8688946B2 (en) Selecting an auxiliary storage medium for writing data of real storage pages
US20190163575A1 (en) Processing i/o operations in parallel while maintaining read/write consistency using range and priority queues in a data protection system
US20140082275A1 (en) Server, host and method for reading base image through storage area network
US8751724B2 (en) Dynamic memory reconfiguration to delay performance overhead
CN109923533B (en) Method and apparatus for separating computation and storage in a database
US20150052328A1 (en) User-controlled paging
US10102135B2 (en) Dynamically-adjusted host memory buffer
US10372347B2 (en) Selectively limiting throughput of test objects that share system resources with production objects
US20160077747A1 (en) Efficient combination of storage devices for maintaining metadata
US8909877B2 (en) Dynamic real storage usage control
CN112748854B (en) Optimized access to a fast storage device

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION