WO2004023792A1 - Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product - Google Patents

Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product Download PDF

Info

Publication number
WO2004023792A1
WO2004023792A1 PCT/IB2003/003380 IB0303380W WO2004023792A1 WO 2004023792 A1 WO2004023792 A1 WO 2004023792A1 IB 0303380 W IB0303380 W IB 0303380W WO 2004023792 A1 WO2004023792 A1 WO 2004023792A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
error
time
estimation
performance
Prior art date
Application number
PCT/IB2003/003380
Other languages
French (fr)
Inventor
Rudi J. M. Wijnands
Ozcan Mesut
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to AU2003250409A priority Critical patent/AU2003250409A1/en
Publication of WO2004023792A1 publication Critical patent/WO2004023792A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2404Monitoring of server processing errors or hardware failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • H04N21/2326Scheduling disk or memory reading operations

Definitions

  • the invention relates to a method of real-time disk-scheduling wherein pending requests to the disk, to be effectively processed, are arranged by taking for a request into account an estimation of disk service time under a given condition of a performance behavior of the disk. Further the invention relates to a disk-scheduler wherein pending requests to the disk are arranged to be effectively processed in real-time by taking for a request into account an estimation of service time of the disk under a given condition of a performance behavior of the disk.
  • the invention also leads to a file system, a data storage system and a computer program product.
  • a real-time file system For real-time hard disk data processing usually a real-time file system is to be used.
  • Conventional data processing systems are adapted to aim for a maximum data integrity, for instance by completing a command only until properly executed.
  • Traditional data- orientated data processing systems have no real-time requirements and aiming for maximum data integrity is properly applied primarily for a traditional type of data, like information technology data, which are reliability-critical data.
  • information technology data and/or reliability critical data a device should not return errors to the host regarding a transfer of information technology data until all possible data recovery procedures have been exhausted.
  • a stream is the time-based transfer of data to or from a device.
  • Stream data is defined as time-critical data unlike the above defined information technology data. Therefore, data streaming has to fulfill a variety of different requirements.
  • a stream is composed of one or more allocation units.
  • An allocation unit is the smallest logically contiguous group of blocks on a storage medium. Each unit may be accessed by one or more requests to the disk.
  • All logical block addresses associated with a single allocation unit are logically contiguous.
  • the number of logical block addresses in the requests and allocation units are important in order to maintain the stream data rate. For instance, the request should be chosen such that the stream transfer rate requirements of a storage medium or a disk are not affected.
  • stream data and, in particular, audio-video data have a time-based orientation. Such data should be delivered within certain time constraints.
  • a real-time file system usually comprises a hard-disk request scheduler.
  • the hard-disk request scheduler arranges requests from the file system in a proper way such that the requests are effectively processed.
  • a contemporary real-time file system is in principle described in WO 01/25892 Al.
  • a bandwidth allocator is utilized to allocate a bandwidth of the storage system or disk. This information is used to order requests to the disk in a guaranteed rate queue according to a deadline. Further requests are ordered in a non-rate-guaranteed queue according to a priority.
  • a disk error may arise from physical errors on a hard-disk. Physical errors may be handled according to a prior art defect management method, as e.g. known from GB 2 285 166 A, relying on an allocation and re-allocation scheme, sometimes referred to as a skip-and-slip-scheme, for erroneous sectors on a hard-disk.
  • Contemporary methods of disk-scheduling for an estimation of service time of the disk do not take into account disk errors like physical errors, re-assigned sectors or erroneous data, basically because either the locations of such disk errors on the data storage disk are not known or the performance penalty due to such disk errors is not known.
  • the object of which is to provide a method and an apparatus capable of improving an estimation of a service time of a data storage disk and/or data storage system to achieve a more realistic prediction of the service time of a disk request and thereby increasing a performance efficiency of the disk and/or the data storage system.
  • the object is achieved by a method as referenced in the introduction, wherein in accordance with the invention the estimation is performed on basis of a disk model describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error.
  • the estimation is performed by means of a disk-scheduler.
  • the disk scheduler is advantageously part of a real-time file system, in particular in a file system layer of a real-time file system.
  • the disk-scheduler may use the disk model.
  • the applied disk model should be able to describe the complete behavior of the hard-disk.
  • Such model may comprise several time parameters like a host and a device queue time, which mainly take into account the time that a request waits in a host or a disk while previous requests are being served. Further mechanical parameters of the disk, in particular the disk head, may be implemented, like a seek-time or a jump-time.
  • Time parameters taking into account the rotation of the disk may be implemented as a rotational latency time or a rotational transfer time.
  • bus delays like a bus busy time and a bus transfer time may also be implemented to provide a complete model for the performance behavior of the hard disk and its periphery.
  • a disk error may either be localized by a specific measurement or in course of an access to the disk, e.g. a read or a write access.
  • a write access is particular advantageous as an occurring error may be obtained already when writing to the disk. Such error is already known then, when reading the disk. Thereby further delays are almost prevented.
  • the penalty may be determined either by measurement or calculation, the latter preferably on basis of properly provided data. It is particularly preferred to localize the disk error and/or to determine a performance penalty due to the disk error in course of a single measurement wherein one or more accesses to the disk are made. Further developed configurations thereof are outlined in the dependent claims 9 to 12. Preferred embodiments of these and other configurations of the proposed method are outlined in the detailed description.
  • the object is achieved by a disk-scheduler as mentioned in the introduction, wherein according to the invention the disk-scheduler comprises: means for performing the estimation on basis of a disk model describing the performance behavior of the disk and means for providing information on a reduction of the performance behavior of the disk caused by a disk error.
  • the means for performing the estimation may be any kind of software code section capable to calculate on basis of a disk model a service time for a pending request to the disk to thereby describe the performance behavior of the disk.
  • the means for providing information may be any kind of data storage, data delivery or software means capable to make the information known from disk error localization and performance penalty determination available to the means for performing estimation of service time of the disk under the given performance behavior of the disk.
  • the invention also leads to a file system comprising such disk-scheduler.
  • the invention leads to a data storage system, like e.g. a host, comprising a disk or a disk system and a file system as outlined above, wherein a device driver for communication between the disk and the file system is provided.
  • a data storage system like e.g. a host, comprising a disk or a disk system and a file system as outlined above, wherein a device driver for communication between the disk and the file system is provided.
  • the data storage system comprises an application layer on top of the file system layer wherein an application programming interface for communication between the application and the file system is provided.
  • the invention further leads to a computer program product storable on a medium readable by a computer system comprising a software code section which induces the computer system to execute the invention method as proposed when the product is executed on the computer system.
  • Figure 1 A scheme of a data storage system adapted for real-time application wherein a performance penalty due to a disk error is determined and provided to a hard-disk- drive scheduler and further is taken into account in course of an estimation of service time of the disk under a given condition of a performance behavior of the disk.
  • Figure 1 shows a preferred embodiment of a layered architecture of a data storage system having an application layer 1 which uses a real-time file system 2a to access a hard disk 4 via a device driver 3.
  • the real-time file system 2a comprises a file system layer 2 for communication with two application programming interfaces 5, 6.
  • One application programming interface 5 is adapted mainly for non-real-time applications, i. e. for best-effort PC-like file access, that provides normal file access functions like OPEN, CLOSE, READ, WRITE, SEEK, etc. to handle information technology data.
  • a second application programming interface 6 is adapted for treating files as real-time streams by means of functions to access files as real-time streams with, for example audio-visual data, like START_STREAM, STOP_STREAM, PAUSE_STREAM, etc. to handle stream data.
  • the file system 2 creates requests and issues these requests to a hard-disk request scheduler 7.
  • the scheduler 7 determines the type of request, i. e. best-effort or real-time.
  • the hard-disk drive-scheduler 7 arranges and orders the pending requests to the disk such that they are effectively processed.
  • the scheduler predicts a nominal service time of the disk wherein an estimation of service time is performed on basis of a disk model describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error.
  • the disk errors are identified and registered either in the file system layer 2a or in the device driver layer 3. They are reported to the scheduler 7 by the disk 4 via the driver 3. Read and write accesses and respective communication of requests and information are indicated by arrows in Figure 1.
  • realistic service time predictions are made by the scheduler. Consequently, the file system and/or the data storage system is capable to meet the real-time requirements within a deadline predetermined according to the service time predictions.
  • a disk error is localized on the disk 4 and a performance penalty due to the disk error is determined by the scheduler 7.
  • the disk error is localized by a specific measurement wherein one or more accesses to the disk are made.
  • Such specific measurement may be made once, merely for the purpose to localize a disk error and identify the logical block address of the disk error.
  • such measurement may also be performed during or immediately subsequent an access to the disk, either a write or read access.
  • the logical block address of disk errors and/or other relevant data may be registered by any means such that the registered data are accessible and available for the system and may be provided to the scheduler in a proper and efficient way.
  • One possibility is, for instance, to register the disk error at a file system layer preferably by means of the disk-scheduler. Another possibility also is to register these information in a log file storable on a storing medium or storage device, i. e. on the disk. Such log file could be read by the scheduler, if information about a disk error is requested.
  • the logical block address of the disk error is taken from a log-list already provided by the disk drive.
  • Such log-list is, for instance, provided by a hard-disk supporting the streaming feature set as part of the ATA- standard of future hard-disk drives.
  • a log-list is preferably supported by a feature set specifically adapted for logging. Such feature set is for instance the general purpose logging feature set.
  • the ATA-standard provides a common attachment interface for system manufacturers, system integrators, software suppliers and suppliers of intelligent storage devices with regard to data streaming and real-time requirements.
  • a standard includes a packet command feature set implemented by devices commonly known as AT API devices.
  • the streaming and/or advantageously the general purpose logging feature set is an optional feature set that allows the host to request delivery of data from a contiguous logical block address range within an allotted time, the priority being placed on the time taken to access the data rather than on the integrity of the data.
  • Such feature set comprises the mentioned log-list of all sectors and the logical block addresses that have been re-assigned. Therefore for such future art disk drives comprising such ATA-standard feature set a specific measurement to localize a disk error and to identify its logical block address is unnecessary which is a particular advantage of such modified embodiment.
  • the scheduler takes all sectors with a performance penalty into account, however, if a log-list of re-assigned sectors is available, this will already result in a significant increase of performance efficiency of the hard-disk and the data storage system.
  • a logical block address of a disk error which may be any kind of error like erroneous data, re-assigned errors or physical disk errors, is known, still the performance penalty due to the disk error is to be determined.
  • several modifications are available to determine such performance penalty.
  • access time for for instance access time for, three or more subsequent sectors may be measured including the disk error and neighboring error-free locations of the disk.
  • a worst-case value in access time is estimated and used to estimate a reduction of the performance behavior of the disk.
  • Further modifications may be implemented without departing from the spirit of the proposed concept.
  • the estimation of service time may take into account the starting point of a read/write head of the disk and the aiming address of the head. Further the load of the disk and the total number of disk errors can be implemented in a proper way. Also one may choose an estimated or measured or calculated value indicating a time to access a location of the disk under a given condition of a certain amount of disk errors.
  • a disk error is localized on the disk and a performance penalty due to the disk error is determined by an advantageous method as described in the following.
  • the steps thereof may be executed on an arbitrary predetermined and preliminary range of sectors/blocks on the disk.
  • a first step access is made to a number of sectors/blocks of the range.
  • a read/write access is made to the number of sectors from /to the disk.
  • the service time for this access is measured.
  • the measured service time is compared with an estimated service time as predicted by the estimation performed on basis of a disk model describing the performance behavior of the disk.
  • the estimated service time not yet takes into account a reduction of the performance behavior of the disk caused by a disk error and will be referred to as the estimated nominal service time, which will be in general too optimistic.
  • the measured service time exceeds the estimated nominal service time. If this is the case, then the selected range of sectors might contain a bad sector producing a performance penalty.
  • the performance penalty may already be defined as the difference value between the measured service time and the estimated optimistic service time. Also, if preferred, already here the measured service time or a worst-case value thereof may be regarded as the realistic service time and may be taken into account for the estimation as a reduction of the performance behavior of the disk caused by the disk error.
  • the above referenced steps may be repeated to confirm the pending result, that the selected range of sectors might contain a bad sector producing a performance penalty. If, upon repeating the referenced steps, the selected sector range still results in a performance penalty, then, there is a reliable probability, that indeed a bad sector exists in the selected sector range.
  • a binary search in particular comprises the steps of: - splitting the selected range of sectors into two halves, measuring the service time of one of the halves, continuing further processing with the half, which is producing the performance penalty, repeating the yet referenced first three steps of the binary search for the half producing the performance penalty, repeating the referenced steps of splitting, measuring and continuing with the penalty producing half until the available half contains merely the bad sector.
  • a first access time for accessing the disk error location is compared to a reference time significant for accessing an error free location on the disk to determine a difference-value in access time.
  • the reference time may be a second access time, however, it is more advantageous to take the estimated nominal service time as outlined above.
  • the difference value in access time i.e. the performance penalty, amounts to 2 milliseconds. Also depending on the amount and localization of disk errors such difference value may be used to be taken into account for estimation of the reduction of the performance behavior of the disk.
  • the Real-Time File System 2a or just the scheduler 7 may be implemented in a data storage system 200 as presented in Figure 2, which is an embodiment of the data storage system according to the invention.
  • the data storage system 200 comprises a host processor 201, a ROM memory 202, a harddisk drive system 203, a RAM memory 204, a Direct Memory Access (DMA) unit 205, an I/O controller 206, connected to a multitude of connectors 207 and a disc drive 208.
  • DMA Direct Memory Access
  • the method according to the invention and further embodiments of this invention may be performed by the host processor 201.
  • the host processor 201 To enable the host processor 201 to perform this method, computer readable code stored in the ROM memory 202 is read by the host processor 201.
  • the host processor 201 is enabled to optimize the performance of the harddisk drive system 203 in transfer of data between the harddisk drive system 203 and the host processor 201 or between the harddisk drive system 203 and the RAM memory 204 via the DMA unit 205.
  • Data may be further processed and outputted via the I/O controller 206 and the multitude of connectors 207 to a printer, monitor, loudspeaker of any other kind of output device.
  • Data to process or to store in the harddisk drive system 203 may also be received via the I/O controller 206 and the multitude of connectors 207 from a keyboard, mouse or any other kind of input device.
  • the computer executable code is store on a CD-ROM 250, from which data can be read by means of the disc drive 208.
  • the data read from the CD- ROM is stored in the host processor 201.
  • disk errors like erroneous sectors, sector re-assignments and erroneous data on a disk, usually result in a performance penalty of a data storage disk, like a hard-disk.
  • Contemporary concepts of real-time disk scheduling do not take into account such performance penalty for scheduling pending requests to the disk and therefore a contemporary scheduler's nominal estimation of the service time of a hard-disk in general is too optimistic. Consequently a contemporary hard-disk is very likely to miss its service time restrictions set by the scheduler. Such performance penalty is in general not known in time.
  • the invention proposes a concept of real-time disk-scheduling wherein pending requests to the disk are arranged to be effectively processed by taking for a request into account an estimation of service time of the disk under a given condition of a performance behavior of the disk. It is proposed, that the estimation is performed on basis of a disk model, describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error. In particular it is proposed to either calculate or measure the performance penalty and take additional time needed for data extraction into account when scheduling extraction of the data from the hard-disk drive. Thereby realistic timing estimates can be made for extracting data from a hard-disk drive system or for storing data to a hard-disk drive system. In particular such information may be required when audio-video data is stored on the hard-disk drive and real-time requirements are needed when extracting data from a mass-storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Disk errors like erroneous sectors, sector re-assignments and erroneous data, usually result in a performance penalty of a data storage disk. Known real-time disk scheduling methods not take into account such performance penalty for scheduling requests and therefore a contemporary scheduler's nominal estimation of the service time of a hard-disk in general is too optimistic. Consequently a contemporary hard-disk may miss its service time restrictions set by the scheduler. The invention proposes disk-scheduling wherein pending requests are arranged to be processed by the disk by taking an estimation of service time of the disk into account under a given condition of a performance behavior of the disk when setting time restrictions for a request. It is proposed, that the estimation is performed on basis of a model, describing the performance behavior of the disk and taking the reduction of the performance of the disk caused by errors into account.

Description

Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product
The invention relates to a method of real-time disk-scheduling wherein pending requests to the disk, to be effectively processed, are arranged by taking for a request into account an estimation of disk service time under a given condition of a performance behavior of the disk. Further the invention relates to a disk-scheduler wherein pending requests to the disk are arranged to be effectively processed in real-time by taking for a request into account an estimation of service time of the disk under a given condition of a performance behavior of the disk. The invention also leads to a file system, a data storage system and a computer program product.
For real-time hard disk data processing usually a real-time file system is to be used. Conventional data processing systems are adapted to aim for a maximum data integrity, for instance by completing a command only until properly executed. Traditional data- orientated data processing systems have no real-time requirements and aiming for maximum data integrity is properly applied primarily for a traditional type of data, like information technology data, which are reliability-critical data. For such information technology data and/or reliability critical data, a device should not return errors to the host regarding a transfer of information technology data until all possible data recovery procedures have been exhausted.
However, such concept has major disadvantages for streaming of data like audio-video data which demand high processing performance and effectiveness. The layout of data, a host, a periphery device and a file system usually have to be adapted to a concept of real-time processing. Stream data, such as audio-video data, are to be processed within certain time limits. A stream is the time-based transfer of data to or from a device. Stream data is defined as time-critical data unlike the above defined information technology data. Therefore, data streaming has to fulfill a variety of different requirements. A stream is composed of one or more allocation units. An allocation unit is the smallest logically contiguous group of blocks on a storage medium. Each unit may be accessed by one or more requests to the disk. All logical block addresses associated with a single allocation unit are logically contiguous. The number of logical block addresses in the requests and allocation units are important in order to maintain the stream data rate. For instance, the request should be chosen such that the stream transfer rate requirements of a storage medium or a disk are not affected.
Within such a concept, stream data and, in particular, audio-video data have a time-based orientation. Such data should be delivered within certain time constraints.
Otherwise, such data in principle become valueless. For example, late delivery of data for a new video frame during a movie playback will cause visible artifacts such as skipping, audio noise or video frame corruption. Whereas information technology data may be delayed without visible results, stream data have to be processed with regard to strict time-orientation. In typical real-time systems or applications the quality of data therefore rather is of secondary interest. Instead the timeliness of delivery of the data is of paramount importance.
For the purpose of real-time requirements a real-time file system usually comprises a hard-disk request scheduler. The hard-disk request scheduler arranges requests from the file system in a proper way such that the requests are effectively processed. In particular, it is to be guaranteed that real-time streaming performances requirements are met. E.g. within a single request it should be guaranteed that no seeks are needed. Only between two subsequent requests or accesses a seek may be allowed without affecting the stream transfer rate requirements. E.g. for such purpose a seek and a rotational latency of a disk should be taken into account. Therefore, a hard-disk scheduler takes into account for a request an estimation of service time of the disk under a given condition of a performance behavior of the disk. In general a hard-disk request scheduler predicts a nominal service time for each request.
This in particular means, that the actual service time of a hard-disk request should not exceed the service time estimated by the scheduler.
A contemporary real-time file system is in principle described in WO 01/25892 Al. Therein a bandwidth allocator is utilized to allocate a bandwidth of the storage system or disk. This information is used to order requests to the disk in a guaranteed rate queue according to a deadline. Further requests are ordered in a non-rate-guaranteed queue according to a priority.
However, known contemporary approaches to estimate a service time of a disk are restricted to non-real-time applications. Such approaches are basically known from US 6,260,108 Bl regarding read requests and from US 5,854,941 as regarding access requests, both for non-real-time applications.
Concepts for service time estimations which are specifically adapted for realtime applications have not yet been reported. A major problem arises as contemporary concepts of real-time disk-scheduling do not take into account disk errors for an estimation of service time of the disk. A disk error may arise from physical errors on a hard-disk. Physical errors may be handled according to a prior art defect management method, as e.g. known from GB 2 285 166 A, relying on an allocation and re-allocation scheme, sometimes referred to as a skip-and-slip-scheme, for erroneous sectors on a hard-disk. Within defect management methods of such kind errors may be concealed or corrected by writing data, which are originally scheduled to be written to a defect sector of the hard-disk drive to some other area of a hard-disk drive. In this sense the erroneous sectors are referred to as re-assigned sectors. When applying such conventional scheme there are certain disadvantages. In particular, as a data transfer head usually has to perform track-switching or a seek to a remote spare sector or at least is not able to read from an originally scheduled sequence of a physical block address this will usually result in a performance penalty of the data storage disk. Also such performance penalty can arise from erroneous data which may have not been fully success written to the disk, for instance due to time restrictions. This may result in that the data in the stream stored on the disk are basically undefined. The data may be either erroneous or belong to a totally different stream. All these cases referenced above and cases of similar kind, i.e. all cases wherein data have not been written to the disk properly as predetermined by a host, are referred to as disk errors in the following.
Contemporary methods of disk-scheduling for an estimation of service time of the disk do not take into account disk errors like physical errors, re-assigned sectors or erroneous data, basically because either the locations of such disk errors on the data storage disk are not known or the performance penalty due to such disk errors is not known.
As a result of disk errors a contemporary scheduler's nominal estimation of the service time of a hard-disk request is in general too optimistic. The reason is, that disk errors are neglected. This usually means, that a data storage disk request will not meet its deadline and the requirement of real-time streaming guarantees will not be met. This may result in buffer over- or underflows. In particular, in real-time and audio- video recording applications this may result in a substantial data loss and a reduction of the quality of streaming data. This is where the invention comes in, the object of which is to provide a method and an apparatus capable of improving an estimation of a service time of a data storage disk and/or data storage system to achieve a more realistic prediction of the service time of a disk request and thereby increasing a performance efficiency of the disk and/or the data storage system.
As regards the method, the object is achieved by a method as referenced in the introduction, wherein in accordance with the invention the estimation is performed on basis of a disk model describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error.
It has been realized by the invention that, by applying the proposed concept, disk request deadline violations resulting in buffer over- and underflows are avoided even, when disk errors like physical errors, re-assigned sectors or erroneous data, which usually result in a performance penalty, are accessed. The use of such performance penalty information in course of a real-time disk-scheduling concept results in a more realistic and more reliable estimation of the performance behavior of the disk and therefore in a more reliable and more realistic prediction of the service time of the disk. An essential advantage of the proposed concept is, that the scheduler is able to guarantee a real-time performance of a disk more reliably with regard to streaming data, whereas in prior art concepts buffer over- and underflows often result from incorrect time estimations. The proposed concept allows to increase the performance efficiency of the disk and/or a data storage system. In particular also a host system's performance is also advantageously affected.
Developed configurations of the inventive method are further outlined in the dependent claims.
Most preferably the estimation is performed by means of a disk-scheduler. The disk scheduler is advantageously part of a real-time file system, in particular in a file system layer of a real-time file system. For the estimation the disk-scheduler may use the disk model. The applied disk model should be able to describe the complete behavior of the hard-disk. Such model may comprise several time parameters like a host and a device queue time, which mainly take into account the time that a request waits in a host or a disk while previous requests are being served. Further mechanical parameters of the disk, in particular the disk head, may be implemented, like a seek-time or a jump-time. Time parameters taking into account the rotation of the disk may be implemented as a rotational latency time or a rotational transfer time. Furthermore bus delays like a bus busy time and a bus transfer time may also be implemented to provide a complete model for the performance behavior of the hard disk and its periphery. By further taking into account the reduction of the performance behavior of the disk caused by a disk error according to the proposed concept a more realistic and more reliable estimation of the service issued, the proposed concept is able to predict a more reliable service time for such a request.
As in most cases neither the location of a disk error nor a performance penalty due to the disk error are known and it is particular preferred in a further developed configuration of the method that in a first step the disk error is localized and in a further step the performance penalty due to the disk error is determined to determine the reduction of the performance behavior of the disk due to the disk error. Such reduction of the performance behavior of the disk subsequently maybe taken into account for the estimation of service time of the disk as outlined above.
As regards a localization of a disk error this may be achieved in several advantageous ways. A disk error may either be localized by a specific measurement or in course of an access to the disk, e.g. a read or a write access. A write access is particular advantageous as an occurring error may be obtained already when writing to the disk. Such error is already known then, when reading the disk. Thereby further delays are almost prevented. Also it may be advantageous to take the logical block address of the disk error from a log-list provided by the disk drive. It is particular preferred that the logical block address of the disk error is identified and advantageously registered at a file system layer.
As regards the performance penalty of the disk error, the penalty may be determined either by measurement or calculation, the latter preferably on basis of properly provided data. It is particularly preferred to localize the disk error and/or to determine a performance penalty due to the disk error in course of a single measurement wherein one or more accesses to the disk are made. Further developed configurations thereof are outlined in the dependent claims 9 to 12. Preferred embodiments of these and other configurations of the proposed method are outlined in the detailed description. As regards the apparatus, the object is achieved by a disk-scheduler as mentioned in the introduction, wherein according to the invention the disk-scheduler comprises: means for performing the estimation on basis of a disk model describing the performance behavior of the disk and means for providing information on a reduction of the performance behavior of the disk caused by a disk error.
The means for performing the estimation may be any kind of software code section capable to calculate on basis of a disk model a service time for a pending request to the disk to thereby describe the performance behavior of the disk.
The means for providing information may be any kind of data storage, data delivery or software means capable to make the information known from disk error localization and performance penalty determination available to the means for performing estimation of service time of the disk under the given performance behavior of the disk. The invention also leads to a file system comprising such disk-scheduler.
Also the invention leads to a data storage system, like e.g. a host, comprising a disk or a disk system and a file system as outlined above, wherein a device driver for communication between the disk and the file system is provided.
In a further preferred configuration the data storage system comprises an application layer on top of the file system layer wherein an application programming interface for communication between the application and the file system is provided.
The invention further leads to a computer program product storable on a medium readable by a computer system comprising a software code section which induces the computer system to execute the invention method as proposed when the product is executed on the computer system.
The invention will now be described in detail with reference to the accompanying drawing. The detailed description will illustrate and describe what is considered as a preferred embodiment of the invention. It should, of course, be understood that various modifications and changes in form or detail could readily be made without departing from the spirit of the invention. It is therefore intended that the invention may not be limited to the exact form and detail shown and described herein, nor to anything less than the whole of the invention disclosed herein and as claimed hereinafter. Further the features described in the description, the drawing and the claims disclosing the invention may be essential for the invention considered alone or in combination.
The Figure of the drawing illustrates in:
Figure 1 : A scheme of a data storage system adapted for real-time application wherein a performance penalty due to a disk error is determined and provided to a hard-disk- drive scheduler and further is taken into account in course of an estimation of service time of the disk under a given condition of a performance behavior of the disk.
Figure 1 shows a preferred embodiment of a layered architecture of a data storage system having an application layer 1 which uses a real-time file system 2a to access a hard disk 4 via a device driver 3. The real-time file system 2a comprises a file system layer 2 for communication with two application programming interfaces 5, 6. One application programming interface 5 is adapted mainly for non-real-time applications, i. e. for best-effort PC-like file access, that provides normal file access functions like OPEN, CLOSE, READ, WRITE, SEEK, etc. to handle information technology data. A second application programming interface 6 is adapted for treating files as real-time streams by means of functions to access files as real-time streams with, for example audio-visual data, like START_STREAM, STOP_STREAM, PAUSE_STREAM, etc. to handle stream data. Further the file system 2 creates requests and issues these requests to a hard-disk request scheduler 7. The scheduler 7 determines the type of request, i. e. best-effort or real-time. Further the hard-disk drive-scheduler 7 arranges and orders the pending requests to the disk such that they are effectively processed. Also the scheduler predicts a nominal service time of the disk wherein an estimation of service time is performed on basis of a disk model describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error. The disk errors are identified and registered either in the file system layer 2a or in the device driver layer 3. They are reported to the scheduler 7 by the disk 4 via the driver 3. Read and write accesses and respective communication of requests and information are indicated by arrows in Figure 1. In the preferred embodiment realistic service time predictions are made by the scheduler. Consequently, the file system and/or the data storage system is capable to meet the real-time requirements within a deadline predetermined according to the service time predictions.
As a basis of the outlined estimation in the preferred embodiment a disk error is localized on the disk 4 and a performance penalty due to the disk error is determined by the scheduler 7.
In a first modification of the preferred embodiment regarding the localization of a disk error the disk error is localized by a specific measurement wherein one or more accesses to the disk are made. Such specific measurement may be made once, merely for the purpose to localize a disk error and identify the logical block address of the disk error.
In a further modification of the preferred embodiment regarding the localization of a disk error, such measurement may also be performed during or immediately subsequent an access to the disk, either a write or read access. The logical block address of disk errors and/or other relevant data may be registered by any means such that the registered data are accessible and available for the system and may be provided to the scheduler in a proper and efficient way.
One possibility is, for instance, to register the disk error at a file system layer preferably by means of the disk-scheduler. Another possibility also is to register these information in a log file storable on a storing medium or storage device, i. e. on the disk. Such log file could be read by the scheduler, if information about a disk error is requested. A particular preferred further possibility is, that the logical block address of the disk error is taken from a log-list already provided by the disk drive. Such log-list is, for instance, provided by a hard-disk supporting the streaming feature set as part of the ATA- standard of future hard-disk drives. In particular a log-list is preferably supported by a feature set specifically adapted for logging. Such feature set is for instance the general purpose logging feature set.
The ATA-standard (AT Attachment Interface) provides a common attachment interface for system manufacturers, system integrators, software suppliers and suppliers of intelligent storage devices with regard to data streaming and real-time requirements. Such a standard includes a packet command feature set implemented by devices commonly known as AT API devices. The streaming and/or advantageously the general purpose logging feature set is an optional feature set that allows the host to request delivery of data from a contiguous logical block address range within an allotted time, the priority being placed on the time taken to access the data rather than on the integrity of the data. Such feature set comprises the mentioned log-list of all sectors and the logical block addresses that have been re-assigned. Therefore for such future art disk drives comprising such ATA-standard feature set a specific measurement to localize a disk error and to identify its logical block address is unnecessary which is a particular advantage of such modified embodiment.
Advantageously, the scheduler takes all sectors with a performance penalty into account, however, if a log-list of re-assigned sectors is available, this will already result in a significant increase of performance efficiency of the hard-disk and the data storage system. Once a logical block address of a disk error, which may be any kind of error like erroneous data, re-assigned errors or physical disk errors, is known, still the performance penalty due to the disk error is to be determined. In the preferred embodiment several modifications are available to determine such performance penalty. In a first modification of the preferred embodiment regarding performance penalty determination, for instance access time for, three or more subsequent sectors may be measured including the disk error and neighboring error-free locations of the disk. Based on such a read access a worst-case value in access time is estimated and used to estimate a reduction of the performance behavior of the disk. Further modifications, of course, may be implemented without departing from the spirit of the proposed concept. For instance, the estimation of service time may take into account the starting point of a read/write head of the disk and the aiming address of the head. Further the load of the disk and the total number of disk errors can be implemented in a proper way. Also one may choose an estimated or measured or calculated value indicating a time to access a location of the disk under a given condition of a certain amount of disk errors.
In a particular preferred embodiment, a disk error is localized on the disk and a performance penalty due to the disk error is determined by an advantageous method as described in the following. The steps thereof may be executed on an arbitrary predetermined and preliminary range of sectors/blocks on the disk.
In a first step, access is made to a number of sectors/blocks of the range. In particular, a read/write access is made to the number of sectors from /to the disk. In a further step, the service time for this access is measured.
Then in another step the measured service time is compared with an estimated service time as predicted by the estimation performed on basis of a disk model describing the performance behavior of the disk. Here the estimated service time not yet takes into account a reduction of the performance behavior of the disk caused by a disk error and will be referred to as the estimated nominal service time, which will be in general too optimistic.
In still a further step, it is checked whether the measured service time exceeds the estimated nominal service time. If this is the case, then the selected range of sectors might contain a bad sector producing a performance penalty. Here the performance penalty may already be defined as the difference value between the measured service time and the estimated optimistic service time. Also, if preferred, already here the measured service time or a worst-case value thereof may be regarded as the realistic service time and may be taken into account for the estimation as a reduction of the performance behavior of the disk caused by the disk error.
In a more elaborated configuration of the above method the above referenced steps may be repeated to confirm the pending result, that the selected range of sectors might contain a bad sector producing a performance penalty. If, upon repeating the referenced steps, the selected sector range still results in a performance penalty, then, there is a reliable probability, that indeed a bad sector exists in the selected sector range.
To localize the disk error, i.e. the bad sector, in principle, two possible ways may be applied. Each one of the possibilities may be chosen advantageously depending on the situation.
In a first possibility at least the above referenced first four steps are repeated and applied to each one of the sectors in the range to thereby identify the bad sector.
In a second possibility a fast binary search may be executed to find the bad sector in the range. A binary search in particular comprises the steps of: - splitting the selected range of sectors into two halves, measuring the service time of one of the halves, continuing further processing with the half, which is producing the performance penalty, repeating the yet referenced first three steps of the binary search for the half producing the performance penalty, repeating the referenced steps of splitting, measuring and continuing with the penalty producing half until the available half contains merely the bad sector.
In the measurement of the performance penalty a first access time for accessing the disk error location is compared to a reference time significant for accessing an error free location on the disk to determine a difference-value in access time. In principle the reference time may be a second access time, however, it is more advantageous to take the estimated nominal service time as outlined above. As an example, if the first access time would be 5 milliseconds and the reference time would be 3 milliseconds, the difference value in access time, i.e. the performance penalty, amounts to 2 milliseconds. Also depending on the amount and localization of disk errors such difference value may be used to be taken into account for estimation of the reduction of the performance behavior of the disk.
Once a location of an error is known and once the value is known how much longer it takes to access a disk error location as compared to an error-free location on the disk, this knowledge is implemented to estimate the service time of the disk on basis of the model describing the whole behavior of the disk. Such value may be chosen as an average value. Nevertheless in general advantageously a worst-case value is preferred as an average value may in certain cases be too optimistic and therefore may negatively affect or even destroy real-time guarantees. In any way in the preferred embodiment, a too optimistic prediction of the deadline for a real-time request is fairly prevented by above outlined measure and therefore real-time streaming requirements are guaranteed in a realistic way and are met by the hard- disk drive. Consequently, buffer over- and underflows which could have resulted in substantial data loss or reduction of audio-video quality, is prevented by the proposed preferred embodiment and modifications thereof.
The Real-Time File System 2a or just the scheduler 7 may be implemented in a data storage system 200 as presented in Figure 2, which is an embodiment of the data storage system according to the invention. The data storage system 200 comprises a host processor 201, a ROM memory 202, a harddisk drive system 203, a RAM memory 204, a Direct Memory Access (DMA) unit 205, an I/O controller 206, connected to a multitude of connectors 207 and a disc drive 208.
The method according to the invention and further embodiments of this invention may be performed by the host processor 201. To enable the host processor 201 to perform this method, computer readable code stored in the ROM memory 202 is read by the host processor 201. In this way, the host processor 201 is enabled to optimize the performance of the harddisk drive system 203 in transfer of data between the harddisk drive system 203 and the host processor 201 or between the harddisk drive system 203 and the RAM memory 204 via the DMA unit 205.
Data may be further processed and outputted via the I/O controller 206 and the multitude of connectors 207 to a printer, monitor, loudspeaker of any other kind of output device. Data to process or to store in the harddisk drive system 203 may also be received via the I/O controller 206 and the multitude of connectors 207 from a keyboard, mouse or any other kind of input device.
In a further embodiment, the computer executable code is store on a CD-ROM 250, from which data can be read by means of the disc drive 208. The data read from the CD- ROM is stored in the host processor 201.
In summary, disk errors, like erroneous sectors, sector re-assignments and erroneous data on a disk, usually result in a performance penalty of a data storage disk, like a hard-disk. Contemporary concepts of real-time disk scheduling do not take into account such performance penalty for scheduling pending requests to the disk and therefore a contemporary scheduler's nominal estimation of the service time of a hard-disk in general is too optimistic. Consequently a contemporary hard-disk is very likely to miss its service time restrictions set by the scheduler. Such performance penalty is in general not known in time. The invention proposes a concept of real-time disk-scheduling wherein pending requests to the disk are arranged to be effectively processed by taking for a request into account an estimation of service time of the disk under a given condition of a performance behavior of the disk. It is proposed, that the estimation is performed on basis of a disk model, describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error. In particular it is proposed to either calculate or measure the performance penalty and take additional time needed for data extraction into account when scheduling extraction of the data from the hard-disk drive. Thereby realistic timing estimates can be made for extracting data from a hard-disk drive system or for storing data to a hard-disk drive system. In particular such information may be required when audio-video data is stored on the hard-disk drive and real-time requirements are needed when extracting data from a mass-storage medium.

Claims

CLAIMS:
1. Method of disk-scheduling wherein pending requests to a disk, to be processed, are arranged by taking for a request into account an estimation of a disk service time under a given condition of a performance behavior of the disk, characterized in that the estimation is performed on basis of a disk model describing the performance behavior of the disk and the estimation takes into account a reduction of the performance behavior of the disk caused by a disk error.
2. Method as claimed in claim 1, characterized in that the disk error is localized and a performance penalty due to the disk error is determined to determine the reduction of the performance behavior of the disk due to the disk error.
3. Method as claimed in claim 1 or 2, characterized in that the estimation is performed by means of a disk-scheduler.
4. Method as claimed in one of the preceding claims, characterized in that the disk error is localized by a specific measurement wherein one or more accesses to the disk are made.
5. Method as claimed in one of the preceding claims, characterized in that the disk error is localized during or immediately subsequent an access to the disk, in particular a write access to the disk.
6. Method as claimed in one of the preceding claims, characterized in that a logical block address of the disk error is identified.
7. Method as claimed in one of the preceding claims, characterized in that a logical block address of the disk error is registered at a file system layer, in particular by means of a disk-scheduler.
8. Method as claimed in one of the preceding claims, characterized in that a logical block address of the disk error is taken from a log-list provided by the disk drive.
9. Method as claimed in one of the preceding claims, characterized in that the performance penalty of the disk error is determined by a further specific measurement wherein one or more accesses to the disk are made.
10. Method as claimed in claim 9, characterized in that in the measurement an access time for accessing the disk error location is compared to a reference time significant for accessing an error-free location on the disk to determine a difference-value in access time.
11. Method as claimed in claim 9 or 10, characterized in that in the measurement the disk error and neighboring error-free locations of the disk are accessed to determine a worst-case value in access time.
12. Method as claimed in one of the preceding claims, characterized in that the disk error is localized and a performance penalty due to the disk error is determined in course of a single measurement wherein one or more accesses to the disk are made.
13. Disk-scheduler wherein pending requests to a disk are arranged to be effectively processed by taking for a request into account an estimation of service time of the disk under a given condition of a performance behavior of the disk, characterized by means for performing the estimation on basis of a disk model describing the performance behavior of the disk and means for providing information on a reduction of the performance behavior of the disk caused by a disk error.
14. A file system comprising a disk-scheduler as claimed in claim 13.
15. Data storage system comprising a disk and a file system as claimed in claim 14 and a device driver for communication between the disk and the file system.
16. Data storage system as claimed in claim 15, further comprising an application layer on top of the file system layer and an application programming interface for communication between the application and the file system.
17. Computer program product storable on a medium readable by a computer system comprising a software code section which induces the computer system to execute the method as claimed in any one of the preceding method claims when the product is executed on the computer system.
PCT/IB2003/003380 2002-09-05 2003-08-05 Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product WO2004023792A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003250409A AU2003250409A1 (en) 2002-09-05 2003-08-05 Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP02078640 2002-09-05
EP02078640.6 2002-09-05

Publications (1)

Publication Number Publication Date
WO2004023792A1 true WO2004023792A1 (en) 2004-03-18

Family

ID=31970389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2003/003380 WO2004023792A1 (en) 2002-09-05 2003-08-05 Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product

Country Status (2)

Country Link
AU (1) AU2003250409A1 (en)
WO (1) WO2004023792A1 (en)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
STEINMETZ R: "Multimedia file systems survey: approaches for continuous media disk scheduling", COMPUTER COMMUNICATIONS, ELSEVIER SCIENCE PUBLISHERS BV, AMSTERDAM, NL, vol. 18, no. 3, 1 March 1995 (1995-03-01), pages 133 - 144, XP004032495, ISSN: 0140-3664 *
WORTHINGTON B L ET AL: "SCHEDULING ALGORITHMS FOR MODERN DISK DRIVES", PERFORMANCE EVALUATION, AMSTERDAM, NL, vol. 22, no. 1, 1 May 1994 (1994-05-01), pages 241 - 251, XP000578596, ISSN: 0166-5316 *

Also Published As

Publication number Publication date
AU2003250409A1 (en) 2004-03-29

Similar Documents

Publication Publication Date Title
US6366980B1 (en) Disc drive for achieving improved audio and visual data transfer
US8069283B2 (en) Method of processing and prioritizing at least one logical data stream for transmission over at least one physical data stream
US6553476B1 (en) Storage management based on predicted I/O execution times
US8015352B2 (en) Disk drive storage defragmentation system
US8051232B2 (en) Data storage device performance optimization methods and apparatuses
US6690882B1 (en) Method of operating a disk drive for reading and writing audiovisual data on an urgent basis
US6301639B1 (en) Method and system for ordering priority commands on a commodity disk drive
US8112566B2 (en) Methods and apparatuses for processing I/O requests of data storage devices
KR20050013938A (en) System and method for autonomous data scrubbing in a hard disk drive
US20070168569A1 (en) Adaptive resilvering I/O scheduling
KR102106541B1 (en) Method for arbitrating shared resource access and shared resource access arbitration apparatus and shared resource apparatus access arbitration system for performing the same
US20160170646A1 (en) Implementing enhanced performance flash memory devices
US20140189266A1 (en) Efficient read and write operations
US9652158B2 (en) Utilization of disk buffer for background replication processes
US20090327598A1 (en) Disk storage apparatus and program
US7000077B2 (en) Device/host coordinated prefetching storage system
JP4502375B2 (en) File system and control method thereof
US20040223730A1 (en) Data recording/reproduction apparatus, method and program for real-time processing
WO2004023792A1 (en) Method of real-time disk-scheduling, disk-scheduler, file system, data storage system and computer program product
US6535953B1 (en) Magnetic disk, method of accessing magnetic disk device, and recording medium storing disk access control program for magnetic disk device
JP2007011661A (en) Disk unit, and cache memory control method therefor
WO2003065217A2 (en) Method for handling data, data storage system, file system and computer program product
JP3230668B2 (en) Data storage and playback system
KR100983048B1 (en) Method for controlling native command queueing and computing device
JPH11345093A (en) Subsystem for storage device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP