WO1999015953A1 - Optimizing scheduler for read/write operations in a disk file system - Google Patents

Optimizing scheduler for read/write operations in a disk file system Download PDF

Info

Publication number
WO1999015953A1
WO1999015953A1 PCT/US1998/018441 US9818441W WO9915953A1 WO 1999015953 A1 WO1999015953 A1 WO 1999015953A1 US 9818441 W US9818441 W US 9818441W WO 9915953 A1 WO9915953 A1 WO 9915953A1
Authority
WO
WIPO (PCT)
Prior art keywords
read
write requests
disk
set
write
Prior art date
Application number
PCT/US1998/018441
Other languages
French (fr)
Inventor
Richard Joseph Oliver
Original Assignee
Sony Pictures Entertainment, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US93630297A priority Critical
Priority to US08/936,302 priority
Application filed by Sony Pictures Entertainment, Inc. filed Critical Sony Pictures Entertainment, Inc.
Publication of WO1999015953A1 publication Critical patent/WO1999015953A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0602Dedicated interfaces to storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0628Dedicated interfaces to storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0668Dedicated interfaces to storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device

Abstract

A system for optimizing the order of read/write commands in a disk file system is described. An optimizing scheduler blocks the transmission of disk access requests from an application program to a disk controller until a relatively large set of read/write requests is collected. The scheduler then sorts the set of read/write requests into an order which corresponds to the physical distribution of sectors on the disk accessed by the set of read/write requests.

Description

OPTIMIZING SCHEDULER FOR READ/WRITE OPERATIONS IN A DISK FILE SYSTEM

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

FIELD OF THE INVENTION

The present invention relates generally to the field of computer operating systems, and more particularly to scheduling read/write requests in a disk-based file system.

BACKGROUND OF THE INVENTION

Disk based operating systems divide a disk into physical sectors which represent the fundamental units for data storage on the disk. A sector is a portion of the circumference of a track on the disk, and the read /write mechanism within the disk drive can access specific sectors on the disk directly. When an application issues a command to read a specific byte from a disk file, the file system locates the correct surface, track, and sector, reads the entire sector into a memory buffer, and then locates the requested byte within that buffer. Because sector sizes allocated on a disk may be small in relation to the total amount of data required in a typical data transaction, the read /write head may be forced to seek among several different sectors in response to a read or write command by an application program.

The physical read /write mechanism of a hard drive is generally in one of three states: idle, seek, or input /output (I/O). In order to maximize the I/O bandwidth of the disk drive, the time that the drive spends in the idle and seek states must be minimized. A seek state corresponds to moving the read /write mechanism to the correct sector, and the time required by a seek operation is referred to as the seek time. One method used in present disk drive systems of minimizing idle times is to ensure that new read/write requests are available as the previous read /write operations complete. This essentially creates a queue of read /write requests which minimizes the time gap (idle time) between individual read /write operations. Likewise, a method used in present disk drive systems of minimizing seek times is to analyze the queued read /write requests and perform them in the most efficient order with regard to the order of the sectors as they are accessed on the disk. A shortcoming of these present systems is that because they attempt to minimize the latency between the time a request is initiated and the time the operation is performed, the number of requests that are queued at any given time is also minimized. The minimal queue length consequently limits the number of requests which can be optimally ordered, thus resulting in a less than optimal ordering in cases where the number of requests exceeds the size of the queue.

Many general purpose file systems are suitable for use with applications for which minimal optimization of read /write requests occurs. Certain applications, however, require sets of data transfer operations to disk to be completed within a maximum time limit. This time requirement translates to a minimum disk input /output bandwidth, and is typically measured in terms of number of bytes transferred to or from the disk per unit time. For such applications, excessive seek times may increase disk access times the beyond the minimum bandwidth requirements. Examples of I/O intensive applications include real-time applications which require extensive disk access. Although a common solution to this problem may be to use faster disk drives, design and cost constraints may prevent the use of adequately fast disk drives.

It is therefore an intended advantage of the present invention to provide a system for optimizing a large number of disk access requests in a disk file system for use with a broad range of disk drive devices. SUMMARY OF THE INVENTION

The present invention discloses a method for increasing the disk I/O bandwidth in a disk based file system. Disk access requests from an application program to a drive controller are blocked until a relatively large set of read/write requests are collected. An optimizing scheduler sorts the order of the requests so that the order of the set of read/write requests corresponds to the physical distribution of sectors on a disk which are to be accessed by each read /write request. The ordered set of read /write requests are then transmitted to the drive controller.

According to one embodiment of the present invention the disk input/ output queue is blocked from being processed for a fixed period of time while read /write requests accumulate in a buffer. The optimizing scheduler then optimizes the entire queue in a single pass and sends the operations to the drive controller in the optimized order.

An embodiment of the present invention effectively increases the size of the queue of read /write requests to the disk drive, so that new requests inserted into the queue have a greater chance of being optimally placed between requests already in the queue, thus improving the average seek time for the set of requests.

Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:

Figure 1 is a block diagram of a computer system which may be used to implement an embodiment of the present invention. Figure 2 illustrates an example of sector distribution on disk media.

Figure 3 is a block diagram of a disk file system which uses an optimizing command scheduler according to one embodiment of the present invention.

Figure 4 is a flowchart illustrating the process of re-ordering sets of read /write requests to optimize disk drive performance according to one embodiment of the present invention.

Figure 5 is a table which illustrates the relative seek distances required in different command queuing systems, including a system according to one embodiment of the present invention.

DETAILED DESCRIPTION

A system for collecting and optimizing read /write commands in a disk file system is described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form to facilitate explanation.

Hardware Overview

Figure 1 illustrates a block diagram of a computer which may be used to implement an embodiment of the present invention. The computer system 100 includes a processor 102 coupled through a bus 101 to a random access memory (RAM) 104, a read only memory (ROM) 106, and display device 120. Keyboard 121 and cursor control unit 122 are coupled to bus 101 for communicating information and command selections to processor 102. Also coupled to processor 102 through bus 101 is an input/output (I/O) interface 123 which can be used to control and transfer data to electronic devices connected to computer 100.

Mass storage device 107 is also coupled to processor 102 through bus 101. Mass storage device 107 represents a memory device which stores data accessible from processor 102 through a sector-based file system used by the computer system 100. Mass storage device 107 may be a persistent storage device, such as a floppy disk drive or a fixed disk drive (e.g., magnetic, optical, magneto-optical, or the like), which can directly access locations on the disk for reading and writing data to and from the disk media. Alternatively, mass storage device 107 could be a tape drive which accesses data blocks placed sequentially on streaming tape media. It should be noted that the architecture of Figure 1 is provided only for purposes of illustration, and that a computer used in conjunction with the present invention is not limited to this specific architecture.

Command Optimization

Disk file systems organize data on disks in sectors. Due to file fragmentation or sector interleaving, data relating to a particular file may be spread among several discontiguous sectors. Figure 2 illustrates an example of the distribution of related sectors on a disk file. Disk 200 contains several tracks and four sectors containing data. The sectors are numbered 1, 2, 3, and 4. If an application requests to read to data from these sectors, a file system which employs a simple command FIFO (first-in, first-out buffer) would simply access the sectors in the order the commands are received. Thus, with reference to Figure 2, if the application requested data to be read from sector 1, sector 2, sector 3, and then sector 4, the read /write head of the disk drive would access the sectors in that order. As can be seen in Figure 2, however, this order requires several seek operations between each read operation since each sector is on a different track, and the sectors are not arranged on disk in the order in which they are requested.

Certain known disk file systems use a command queuing system to minimize the number of seeks between sector read operations by ordering the sequence of the sectors as they are accessed by the application to correspond to the physical order of the sectors on the disk. For example, given the sector distribution illustrated in Figure 3, an optimal order to the read/write requests would be 1, 4, 3, 2. As can be seen in Figure 2, this order eliminates any extra track crossings between sequential sector accesses.

Present systems of efficiently ordering disk read /write requests, such as command queuing systems, however, provide only a limited queue size in which commands are re-ordered. Queue size is limited because these systems also attempt to minimize the read /write cycle latency which is the measure of the amount of time between the time that a read /write request is initiated and the time that it is performed. As the length of the queue is increased, the latency correspondingly increases. Thus, in the example provided above, if the queue can only accommodate two instructions while an access command is being completed, the system would cause the sectors to be read in the order of 1-3-2-4. Given the distribution of the sectors in Figure 2, this re-ordered scheme does not reduce the number of seek operations from the original order of 1-2-3-4. Although Figure 2 illustrates an example in which only four sectors are accessed, it will be appreciated that most applications request disk accesses which involve many sectors. In these cases, the limitations of the simple FIFO and command queuing systems are amplified.

Figure 3 illustrates a system according to one embodiment of the present invention in which the effective size of the read /write command queue is increased so that more read /write commands are inserted in the queue to be placed in the optimum order. In Figure 3, application 302 generates read /write requests for data on a disk controlled by drive controller 306. Drive controller 306 represents both the controller hardware circuit within the disk drive, as well as the driver firmware executed by the disk drive. Drive controller 306 sends control signals which control the movement and read /write functionality of head mechanism 308. In one embodiment of the present invention, an optimizing scheduler 304 receives the read /write requests from application 302. Optimizing scheduler 304 collects a number of read /write requests, and then orders these requests so that seek operations between the accessed sectors is minimized.

In a method of the present invention, the optimizing scheduler 304 blocks the queue and holds the read /write requests in a buffer for a specified period of time. This period of time defines a scheduler period which can be varied depending on factors such as the rate of requests issued by the application, the I/O bandwidth requirements of the system, and the amount of fragmentation on the disk. When the end of the scheduler period is reached, the optimizing scheduler optimizes the entire queue in a single pass and sends the requests to the disk in the optimized order.

Alternatively, instead of a period of time representing the threshold condition for releasing the requests, the optimizing scheduler could be configured such that a specified number of accumulated requests serves as the threshold condition. For example, the optimizing scheduler could accumulate 100 requests before optimizing the requests. For this embodiment, optimizing scheduler 304 includes a counter which maintains a count of the number of read /write requests received from application 302.

According to one embodiment of the present invention, the optimizing scheduler 204 sorts the read /write requests so that the order of the requests corresponds to the order in which the sectors to be read are sequentially distributed on the disk. Thus, with reference to Figure 2, the optimum order of the sectors is 1-4-3-2, since these sectors are contained in sequential tracks from outermost to innermost track in this particular order. Furthermore, the sorting direction between sets of read /write requests is exchanged so that the drive mechanism sweeps across the disk in one direction first, then back in the other direction. This eliminates the need for the drive mechanism to move back to the opposite side of the disk after the innermost or outermost track of the disk has been reached.

Figure 5 provides a table which summarizes the access request order and resulting number of seeks for the simple FIFO and two- command queue system described above, in comparison with the results obtained for a scheduler according to an embodiment of the present invention. The sector numbers provided in the Request Order column of Figure 5 correspond to the sector numbers illustrated in Figure 2.

With regard to the method in which the optimizing scheduler 304 sorts the order of the read /write requests, a simple sorting algorithm is implemented in which the sort operation is performed on all collected requests in one pass. Alternatively, the optimizing scheduler implements an incremental sorting algorithm in which requests are placed into their optimal order as they are received.

In a further alternative of the present invention, the read /write operations are prioritized, so that background operations can be performed without impacting the real-time performance of the system. As the distribution of the read /write operations approaches worst case, background read/write requests can be deferred to a later set of requests so that higher priority read /write operations are completed in time. For better case sets of read/write operations, background read/write requests could be interleaved (and still seek-optimized) without causing real-time problems.

Figure 4 is a flowchart illustrating the major steps of optimizing the execution order of disk read /write commands according to a method for one embodiment of the present invention, and with reference to the file system illustrated in Figure 3. In step 402, the optimizing scheduler 304 collects disk read /write requests sent from the application program 302 for one scheduler period. The scheduler period corresponds to the number of read /write requests which are held in the read /write queue before the order of the read /write requests is optimized. After the specified number of read /write requests are collected in the optimizing scheduler buffer, the read /write requests are re-ordered so that the distance of seek operations between consecutive read/write operations is minimized, step 404. The read/write commands are then transmitted from the optimizing scheduler 304 to the drive controller 306 in the optimized order, step 406. If there are additional disk read /write requests issued from the application, the process repeats from step 402, otherwise the process ends.

As has been mentioned earlier, increasing the command queue size increases the latency between the time a read /write request is initiated and the time the operation is performed. This is not generally a problem for write operations and is not a problem for read operations if the disk locations to be accessed are known far enough in advance (as is usually the case for media tracks). Moreover, a larger queue requires larger data buffers than conventional queuing methods, however, this is often an acceptable trade-off for increased disk I/O bandwidth.

One embodiment of the present invention provides a file system method which is suitable for use with applications with high disk I/O bandwidth requirements in which excessive disk access time may negatively impact application performance. A method of the present invention provides a system for optimizing the order in which disk read /write commands are executed, so that disk seek operations may be reduced to satisfy the disk I/O bandwidth requirements of the application programs.

One exemplary application of the optimizing scheduler is its use in a computer implemented digital player /recorder including audio record and playback programs which read and write multiple tracks of data simultaneously to the disk. Such an application has rigorous minimum disk I/O bandwidth requirements for playback, and the disk head may be forced to seek among blocks on the disk in order to play a single track. In some instances, the data samples may not be read from the disk within the required time, in which case, the audio data played back may be interrupted or otherwise distorted. The ordering of large sets of read /write requests in accordance with the physical distribution of sectors on the disk, as described in reference to Figure 4, minimizes the chance that applications fail due to excessive seek operations causing increased data transfer cycles.

Although the above discussion was written in the context of audio applications, it should be noted that a method of the present invention could be used in other similar applications, such as applications which involve both video and audio content, or applications which involve high-speed calculations on limited amounts of data.

In one embodiment of the present invention, the optimizing scheduler 304 illustrated in Figure 3 is implemented as a program which is executed by a processor coupled to the disk drive which is to be accessed (e.g., processor 102 in computer system 100). The optimizing scheduler program could be a program which is incorporated as part of the disk file system, or it could be a stand-alone program which can be called by the disk file system or an application program. Appendix A provides a detailed listing of C++ program code which implements an optimizing scheduler according to one embodiment of the present invention. It will be appreciated however, that methods of the present invention are not limited to the programming language and exact code sequences provided.

The steps of a method of the present invention may be implemented by a central processing unit (CPU) in a computer executing sequences of instructions stored in a memory. The memory may be a random access memory (RAM), read-only memory (ROM), a persistent store, such as a mass storage device, or any combination of these devices. Execution of the sequences of instructions causes the CPU to perform steps according to the present invention. The instructions may be loaded into the memory of the computer from a storage device or from one or more other computer systems over a network connection. Consequently, execution of the instructions may be performed directly by the CPU. In other cases, the instructions may not be directly executable by the CPU. Under these circumstances, the instructions may be executed by causing the CPU to execute an interpreter that interprets the instructions, or by causing the CPU to execute instructions which convert the received instructions to instructions which can be directly executed by the CPU. In other embodiments, hardwired circuitry may be used in place of, or in combination with, software instructions to implement the present invention. Thus, the present invention is not limited to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the computer.

In the foregoing, a system has been described for collecting and optimizing the order of read /write requests in a disk file system. Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention as set forth in the claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Appendix A

// NAME qfssched.hpp

//

// DESCRIPTION

// Quick version of scheduler

//

// COPYRIGHT

// copyright 1994 by Advanced digital Systems Group. All rights reserved.

//

// VERSION CONTROL

// $Header: G: /PVCS/.ARCHIVES/hchallel/QFSSCHED.HPV 1.16

AUG 14, 1997 12:01:16 JCLAAR $ //

#ifndef_QFSSCHED_HPP_ #define_QFSSCHED_HPP_

#include <vector> #include <list>

#include <dbllnk.hpp> #include "task.hpp" #include <timedef.h> ttinclude <sndfile.hpp> ttinclude " refcount .hpp"

class SyloObject ;

class SyloReferenceObject : public SyReferenceCount (

SyloObject* mloObject; public : inline SyloReferenceObject (SyloObject* ioObject) ; -SyloReferenceObject ( ) ; inline void BuildloList ( IoNodeList& ioList ) ; inline Syloobject* IoObject () const; inline SyQFS* FileSystem (void) ; inline void RemoveloObject ( ) void AddNotifyO;

class SyQuickScheduler : public SyTask (

SySync: : SycriticalSection mOb Access;

SySync: : SycriticalSection mReadyObjAccess ; typedef STD: : ector<SyIoObject*, STD: : allocator, Sylo0bject*>> IoObjects; typedef STD: : list<SyioReferenceObject* , STD: :allocator<SyIoReferenceObject*>> ReadyloObjects;

IoObjects mloObject;

ReadyloObjects mReadyloObject;

UInt32 mPeriod;

UInt32 mResolution;

Bool mSortFor ard; // Sort direction for I/Os

void SortReadyList (IoNodeLists& listToSort) ;

protected: virtual UInt32 Iterate (Ubtlβ index): public :

HCHAN_EXP SyQuickScheduler (Uint32 period, UInt32 resolution) ;

HCHAN_EXP -SyQuickScheduler () ; virtual void AddObject (SyloObject* object); virtual void RemoveObject (SyloObject* object) ;

Virtual void AddReadyObject (SyloReferenceObject* object ) virtual void RemoveReadyObject (SyloObject* object ) : inline UInt32 Period () const; inline UInt32 Resolution () const; inline TimeCode IoPeriodt) const;

); ttinline " qfssched . inl "

#endif

// NAME qfssched. cpp

//

// DESCRIPTION

// Quick version of scheduler

//

// COPYRIGHT

// copyright 1997 by Advanced Digital Systems Group. All right reserved.

//

// VERSION CONTROL

// $Header: G:PVCS/ARCHIVES/hchallel/qfssched.cpv 1.37

AUG 22, 1997 14:03:24 jclaar $ //

#include "precomp.h" #include "qfssched.hpp" #include <mutex.hpp> tinclude <algorithm>

void SyloReferenceObject : :AddNotify() ( mIoObject->AddNotify ( ) ; )

SyloReferenceObj ect : : -SyloReferenceObj ect ( ) (

_ASSERT(this->GetRefernceCount () == 0);

/ /TRACE (_T(" SyloRefobj-Datrct: %ld\"), this ); )

SyQuickScheduler : : SyQuickScheduler

( '

UInt32 period. Uint32 resolution )

: SyTask (

NULL NULL 0,

FALSE, period, true, kEBxSchedulerPriority ), mPeriod (period) , mResolution (resolution) . mSoftForward(TRUE) ( )

SyQuickScheduler : : -SyQuickScheduler ( ) ( if ( ImReadyToObj ect . empty ( ) ) (

// TRACE (_T( "qsch-dstrct size; %ld\"), mReadyIoObject->size ( ) ); for( ReadyloObjects :: iterator iter = mReadyloObject .begin () ; iter 1= mReadyloObj ect . end ( ) ; iter++ )

( // TRACE (_T( "qsch-dstrct: %ld\n" , (*iter) );

(*iter) ->Decrement ( ) : ) ) this->kill ( ) : )

void SyQuickScheduler : :AddObject (SyloObject* object) (

_ASSERT (object 1= NULL; this->Stop() :

(

GrabCriticalSection cs (&mObjAccess) ;

// don't add the object if it is already in here.

_ASSERT ( STD : : find (mloObj ect . begin ( ) . mloObj ec . end ( ) . object) ==mlo0bject .end( ) ) ; mloObject .push_back (object); this->SyTask: :AddObj ect (object- .Event ( ) ) ;

)

void SyquickScheduler: :RemoveObject (SyloObject* object) (

_ASSERT(object 1= NULL); this-. Stop () :

(

GrabCriticalSection cs (&mOb ccess ) ;

IoObjects :: iterator i =

STD: : find (mloObj ect .begin ( )) , mlobject . end( ) , object):

_ASSERT(1 1 mloObject.endO ) ; mloObj ect .erase (i) ; this- . SyTask : : RemoveObj ect->Event ( ) ) :

) this->Start ( ) : )

void SyQuickScheduler: :AddReadyObject (SyloReferenceObject* object)

(

// // this code protects against duplicate ioobjects. but

// with the new scheme, I don't think duplicate ioobjects

//can ever happen. This could save some time (RMD)

//

GrabCriticalSection cs (&mReadyObjAccess) ;

_ASSER (object != NULL);

// Don't add the object of it is already in here. if ) object->GerReferenceCount ( ) > 2) );

/ /TRACE (_T("AddReady - cnt %ld\n" ) , object- >GetReferenceCount ( ) ) , return:

)

//if (STD: : find (mReadyloObject->begin ( ) . mReadyloObject- .end( ) . object)

//return; mReadyloObj ect .push_back (ob ect ) ; ob ect->Increment ( ) ; object-> AddNotifyO;

//TRACE)_T("QS-ARO:%ld\n" ) , object ); void SyQuickScheduler : : RemoveReadyObj ect (SyloObject" object) (

If ( (*Iter)->IoObject () == object)

) else

iter++;

) )

UInt32 SyquickScheduler :: Iterate (Kntl6 index) (

UInt32 start = SyTimeGetTime ( ) :

//

//get the file system that io's will be done on by

//quering the io object that was signaled

// _ASSERT(mIoObject. εize() == this->NumOb ects ( ) ) : #ifdef _DEBUG if (index != AIT_TIMEOUT)

(

_ASSER (index. =0 index, mloObj ect . size ()) :

_ASSERT(this->WaitObjects () [index] == mloObject [index]- >Event ( ) 0 ;

) #endif

// Don't need to get the mObjAccess critical section since AddObject

// and RemoveObject stop the scheduler. .SyQFS* pQFS = NULL; if (index != WAIT_TIMEOUT) ( pQFS =mlo0bject [index] ->FileSystem( ) ; ) else if( 1 mReadyloObj ect .empty () )

( mReadyObjAccess .Enter ( ) : pQFS = mReadyloObj ect. front () ->FileSystem( ) : mReadyObj ccess .Leave ( ) : )

/ /TRACE (_T( "qsch-index: %Id, readysize : %ld. pQFS : %ld, priority: %ld\n" ) , index, mReadyloObj ect->size () , pQFS, GetThreadPriority(this->Thread() ) ) ;

//if (index < mlo0bject->size ( ) )

/ /TRACE (_T ( "qsch-iter eventsig: %ld\n" ) , (mloObject) [ index] ->Event ( ) ) ;

_ASSERT() (pQFS ==NULL && index i= WAIT_TIMEOUT) ) ; if( pQFS 1= NULL)

(

// //copy the current list to a temp list for those items that

//match the designated filesystem. Protect access to list

//

//NOTE: The following code assumes that there is one scheduler

// per disl volume. If there is not, the code below this will

// assert.

ReadyloObjects tempList; mReadyObjAccess . Enter ( ) :

// splice removes all objects from mReadyloObj ect . tempList. splice (tempList .begin () , mReadyloObj ect ) ; mReadyObjAccessLeave ( ) ;

#ifdef_DEBUG

//verify that all objects are really on the same file system.

(

ReadyToObjects : : iterator iter = tempList .Begin () ; for ( ; iter != tempList . end () ; iter++)

_ASSERT( (*iter)->Filesystem() == pQFS) ;

) #endif

//

//iterate over the temp list and call BuildloList for all objects in

//the temp list. if the temp list is empty, then reset the event that

//triggered the scheduler (prevents endless reentry)

//

ToNodeList readyloList;

if (tempList .empty( ) )

( if (index != AIT_TIMEOUT) (

: :SyResetEvent (mloObjec (index) ->Event ( ) ) :

)

) else

(

ReadyloObjects :: iterator iter: for(iter = TempList .begin( ) : iter != tempList .end () ; iter++)

(

/ /TRACE (_T("QS~BIOL %ld\n" ) , (*iter) _:

( *iter) ->BuildIoList (readyloList) :

)

)

// Sort the list by volume offset if (! readyloList .empty () )

( this->SortReadyList (readyloList) ;

IoNodeList: : iterator i = readyloList .begin( ) ; for ( ; 1 != readyloList . end () ; i++) pQFS->PerformAsynchronousIO( (*i) .get() ) ;

)

UInt32 last = SyTimeGetTime ( ) - start; this- .SetTimeout ( ( this-> period () > last) ? this->period( ) - last

: 0 , FALSE ) ;

return D;

) void SyQuickScheduler :: SortReadyList (IoNodeList& listToSort)

//Sort the lost bsed on the current value of mSortForward if (mSortForward) listToSort . sort ( STD : : greater<SyAutoIoNode ( ) ) ; else listToSort . sort ( ) ; // Next time, sort the other way. mSortForward = !mSortForward;

/ NAME qfssched.ini

//

// DESCRIPTION

// Quick version of scheduler

//

// COPYRIGHT

// copyright 1994 by Advanced Digital Systems Group. All right reserved.

//

// VERSION CONTROL

// $Header: G: PVCS/ARCHIVES/hchallei/qfssched. inv 1.4

Dec 23, 1996 14:21:38 RMD $ //

#ifndef_QFSSCHED-INL- #define_QFSSCHED-INL_

#include " ioobject .hpp"

/ /SyQuickScheduler

// inline UInt32 SyQuickScheduler :: Period ( ) const

( return mPeriod; ) inline UInt32 SyQuickScheduler :: Resolution ( ) const ( ) inline Timecode SyQuickScheduler :: IoPeriod( ) const ( return 0x8000; )

//

//SyloReferenceObject

// inline SyloReferenceObject : : SyloReferenceObj ect

(

Sylobject* ioObject

)

; SyReferenceCount ( ) , mloObject (ioObject) ( )

inline void SyloReferenceObject :: BuildloList (

IoNodeListS. ioList

) (

//

// assign member variable locally because Decrement ( ) has

// the potential to delete 'this'. Even through 'this' is passed

// into BuildloList () , it is only used for comparative purposes

/

SyloObject* ploObject = mloObject; this->Decrement ( ) ; pIoObject->BuildIoList (ioList, this ) ; inline SyloObject* SyloReferenceObj ect :: IoObject ( ) const ( return mloObject; ) inline SyQFS*SyIoReferenceObject : :FileSystem (void) ( return (mloObject ! =NULL /*&& this->GetReferenceCount ( ) > 1*/ ) ? mIoObject->FileSystem( ) : NULL; ) '

inline void SyloReferenceObject : :RemoveIoObj ect ( ) ( mloObject = NULL; ) #endif

Claims

CLAIMSWhat is claimed is:
1. A method of executing disk access requests issued by an application program, the method comprising the steps of: receiving a plurality of disk read /write requests from said application program; collecting a set of said plurality of read /write requests until a predetermined threshold condition is reached; sorting the order of said set of read /write requests in relation to the physical distribution on a disk of sectors to be accessed by each read/write request of said set of read/write requests; and transmitting said read /write requests to a drive controller controlling said disk in the order created in said sorting step.
2. A method according to claim 1 wherein said set of read /write requests are stored in a buffer memory coupled to said drive controller.
3. A method according to claim 2 wherein said collecting step further comprises blocking said plurality of read /write requests from being transmitted from said application to said drive controller for a predetermined period of time.
4. A method according to claim 2 wherein said collecting step further comprises blocking said plurality of read /write requests from being transmitted from said application to said drive controller until a predetermined number of read /write requests is accumulated in said buffer.
5. A method according to claim 3 wherein said sorting step is performed on said set of read /write requests upon the collection of an entire set of read /write requests within said time period.
6. A method according to claim 3 wherein said sorting step is performed on each read /write request as it is received into said buffer.
7. An apparatus for optimizing the order of disk access commands issued by an application program, said apparatus comprising: a buffer receiving a plurality of read /write requests issued by said application program, said buffer receiving said plurality of read /write requests until a threshold condition is reached, said read /write requests received when said threshold condition is reached defining a set of read /write requests; a scheduler logically coupled to said buffer, said scheduler configured to sort said set of read /write requests wherein the order of said set of read /write requests corresponds to the physical distribution on a disk of sectors to be accessed by each read /write request of said set of read /write requests; and a drive controller logically coupled to said scheduler for controlling access to said disk in accordance with said set of read /write requests.
8. An apparatus according to claim 7 wherein said scheduler blocks said plurality of read /write requests from being transmitted from said application to said drive controller for a predetermined period of time.
9. An apparatus according to claim 7 wherein said collecting step further comprises a counter, said counter maintaining a count of read /write requests received by said buffer, said scheduler configured to block said plurality of read /write requests from being transmitted from said application to said drive controller until said counter reaches a predetermined number of requests.
10. An apparatus according to claim 8 wherein said scheduler sorts said set of read /write requests upon the collection of an entire set of read /write requests within said period of time.
11. An apparatus according to claim 8 wherein said sorting step is performed on each read /write request as it is received into said buffer.
12. An apparatus for executing disk access requests issued by an application program, the method comprising the steps of: means for receiving a plurality of disk read /write requests from said application program; means for collecting a set of said plurality of read /write requests until a predetermined threshold condition is reached; means for sorting the order of said set of read /write requests in relation to the physical distribution on a disk of sectors to be accessed by each read /write request of said set of read /write requests; and means for transmitting said read /write requests to a drive controller controlling said disk in the order created in said sorting step.
13. An apparatus according to claim 12 further comprising buffer means . for storing said set of read /write requests.
14. A computer readable medium having stored thereon sequences of instructions which are executable by a processor, and which, which, when executed by the processor, cause the processor to perform the steps of: receiving a plurality of disk read /write requests from an application program; collecting a set of said plurality of read /write requests until a predetermined threshold condition is reached; storing said set of read /write requests in a buffer memory; sorting the order of said set of read /write requests in relation to the physical distribution on a disk of sectors to be accessed by each read/write request of said set of read/write requests; and transmitting said read/write requests to a drive controller controlling said disk in the order created in said sorting step.
15. A computer according to claim 14 wherein the memory further contains instructions which cause the processor to perform the step of blocking said plurality of read /write requests from being transmitted from said application to said drive controller for a predetermined period of time.
16. A computer according to claim 14 wherein the memory further contains instructions which cause the processor to perform the step of blocking said plurality of read /write requests from being transmitted from said application to said drive controller until a predetermined number of read /write requests is accumulated in said buffer.
PCT/US1998/018441 1997-09-24 1998-09-03 Optimizing scheduler for read/write operations in a disk file system WO1999015953A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US93630297A true 1997-09-24 1997-09-24
US08/936,302 1997-09-24

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU93772/98A AU9377298A (en) 1997-09-24 1998-09-03 Optimizing scheduler for read/write operations in a disk file system

Publications (1)

Publication Number Publication Date
WO1999015953A1 true WO1999015953A1 (en) 1999-04-01

Family

ID=25468449

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/018441 WO1999015953A1 (en) 1997-09-24 1998-09-03 Optimizing scheduler for read/write operations in a disk file system

Country Status (2)

Country Link
AU (1) AU9377298A (en)
WO (1) WO1999015953A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001065835A2 (en) * 2000-02-28 2001-09-07 Sun Microsystems, Inc. A disk scheduling system with recordering of a bounded number of requests
US6385673B1 (en) 1999-10-06 2002-05-07 Sun Microsystems, Inc. System and method for adjusting performance of a media storage by decreasing a maximum throughput by a primary derate parameter to specify available & guaranteed rate parameters and determining ring buffer sizes for streams
EP1229433A1 (en) * 2001-01-31 2002-08-07 Hewlett-Packard Company File sort for backup
EP1229434A2 (en) * 2001-01-31 2002-08-07 Hewlett Packard Company, a Delaware Corporation File sort for backup
US6438630B1 (en) 1999-10-06 2002-08-20 Sun Microsystems, Inc. Scheduling storage accesses for multiple continuous media streams
GB2393804A (en) * 2002-10-02 2004-04-07 Hewlett Packard Co Retrieval of records from data storage media
US6721789B1 (en) 1999-10-06 2004-04-13 Sun Microsystems, Inc. Scheduling storage accesses for rate-guaranteed and non-rate-guaranteed requests
US7334103B2 (en) 2002-12-11 2008-02-19 Koninklijke Philips Electronics N.V. Methods and apparatus for improving the breathing of disk scheduling algorithms
WO2013128282A1 (en) * 2012-02-28 2013-09-06 Avg Technologies Cz, S.R.O. Systems and methods for enhancing performance of software applications
KR20160036693A (en) * 2014-09-25 2016-04-05 충남대학교산학협력단 Storage device and command scheduling method thereof
US10235203B1 (en) * 2014-03-31 2019-03-19 EMC IP Holding Company LLC Techniques for increasing storage system performance in processor-bound workloads with large working sets and poor spatial locality

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5591049A (en) * 1978-12-29 1980-07-10 Fujitsu Ltd File access control system
WO1997016783A1 (en) * 1995-10-30 1997-05-09 Sony Corporation Methods and apparatus for controlling access to a recording disk
US5644786A (en) * 1990-11-08 1997-07-01 At&T Global Information Solutions Company Method for scheduling the execution of disk I/O operations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5591049A (en) * 1978-12-29 1980-07-10 Fujitsu Ltd File access control system
US5644786A (en) * 1990-11-08 1997-07-01 At&T Global Information Solutions Company Method for scheduling the execution of disk I/O operations
WO1997016783A1 (en) * 1995-10-30 1997-05-09 Sony Corporation Methods and apparatus for controlling access to a recording disk

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"QUEUING ACCESS REQUESTS TO DIRECT ACCESS STORAGE DEVICE", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 38, no. 7, 1 July 1995 (1995-07-01), pages 423 - 425, XP000521743 *
PATENT ABSTRACTS OF JAPAN vol. 004, no. 142 (P - 030) 7 October 1980 (1980-10-07) *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566432B2 (en) 1999-10-06 2013-10-22 Oracle America, Inc. Scheduling storage accesses for rate-guaranteed and non-rate-guaranteed requests
US6385673B1 (en) 1999-10-06 2002-05-07 Sun Microsystems, Inc. System and method for adjusting performance of a media storage by decreasing a maximum throughput by a primary derate parameter to specify available & guaranteed rate parameters and determining ring buffer sizes for streams
US6438630B1 (en) 1999-10-06 2002-08-20 Sun Microsystems, Inc. Scheduling storage accesses for multiple continuous media streams
US6721789B1 (en) 1999-10-06 2004-04-13 Sun Microsystems, Inc. Scheduling storage accesses for rate-guaranteed and non-rate-guaranteed requests
US6496899B1 (en) 2000-02-28 2002-12-17 Sun Microsystems, Inc. Disk scheduling system with bounded request reordering
WO2001065835A2 (en) * 2000-02-28 2001-09-07 Sun Microsystems, Inc. A disk scheduling system with recordering of a bounded number of requests
WO2001065835A3 (en) * 2000-02-28 2002-03-14 Sun Microsystems Inc A disk scheduling system with recordering of a bounded number of requests
EP1229434A2 (en) * 2001-01-31 2002-08-07 Hewlett Packard Company, a Delaware Corporation File sort for backup
EP1229434A3 (en) * 2001-01-31 2009-09-09 Hewlett-Packard Company, A Delaware Corporation File sort for backup
EP1229433A1 (en) * 2001-01-31 2002-08-07 Hewlett-Packard Company File sort for backup
US6772305B2 (en) 2001-01-31 2004-08-03 Hewlett Packard Development Company Lp Data reading and protection
GB2393804B (en) * 2002-10-02 2005-05-18 Hewlett Packard Co Retrieval of records from data storage media
GB2393804A (en) * 2002-10-02 2004-04-07 Hewlett Packard Co Retrieval of records from data storage media
US7334103B2 (en) 2002-12-11 2008-02-19 Koninklijke Philips Electronics N.V. Methods and apparatus for improving the breathing of disk scheduling algorithms
WO2013128282A1 (en) * 2012-02-28 2013-09-06 Avg Technologies Cz, S.R.O. Systems and methods for enhancing performance of software applications
US9110595B2 (en) 2012-02-28 2015-08-18 AVG Netherlands B.V. Systems and methods for enhancing performance of software applications
US10235203B1 (en) * 2014-03-31 2019-03-19 EMC IP Holding Company LLC Techniques for increasing storage system performance in processor-bound workloads with large working sets and poor spatial locality
KR20160036693A (en) * 2014-09-25 2016-04-05 충남대학교산학협력단 Storage device and command scheduling method thereof
KR101687762B1 (en) * 2014-09-25 2017-01-03 충남대학교산학협력단 Storage device and command scheduling method thereof

Also Published As

Publication number Publication date
AU9377298A (en) 1999-04-12

Similar Documents

Publication Publication Date Title
US6836819B2 (en) Automated on-line capacity expansion method for storage device
US5754888A (en) System for destaging data during idle time by transferring to destage buffer, marking segment blank , reodering data in buffer, and transferring to beginning of segment
KR100633982B1 (en) Moving data among storage units
US7047355B2 (en) Updated data write method using journal log
US6449666B2 (en) One retrieval channel in a data controller having staging registers and a next pointer register and programming a context of a direct memory access block
US5426736A (en) Method and apparatus for processing input/output commands in a storage system having a command queue
US6065095A (en) Method for memory allocation in a disk drive employing a chunk array and identifying a largest available element for write caching
US6237062B1 (en) Storage and access to scratch mounts in VTS system
US7159073B2 (en) Data storage and caching architecture
US6047356A (en) Method of dynamically allocating network node memory&#39;s partitions for caching distributed files
Gemmell et al. Multimedia storage servers: A tutorial
US6092154A (en) Method of pre-caching or pre-fetching data utilizing thread lists and multimedia editing systems using such pre-caching
US6023744A (en) Method and mechanism for freeing disk space in a file system
US5887151A (en) Method and apparatus for performing a modified prefetch which sends a list identifying a plurality of data blocks
US7076604B1 (en) Disk drive employing a disk command data structure for tracking a write verify status of a write command
US6041391A (en) Storage device and method for data sharing
KR900004758B1 (en) Mass storage disk drive deefective media handling
EP0559142B1 (en) Data storage format conversion method and system, data access method and access control apparatus
US5933840A (en) Garbage collection in log-structured information storage systems using age threshold selection of segments
US5787482A (en) Deadline driven disk scheduler method and apparatus with thresholded most urgent request queue scan window
US5890208A (en) Command executing method for CD-ROM disk drive
US7082494B1 (en) Disk drive executing a preemptive multitasking operating system comprising tasks of varying priority
EP1746491A1 (en) Method for accessing data, apparatus and recording medium for performing that method
US6490635B1 (en) Conflict detection for queued command handling in disk drive controller
JP2012508428A (en) Method and system for queuing transfers of multiple non-contiguous address ranges with a single command

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DK DK EE EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase in:

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase in:

Ref country code: CA

122 Ep: pct application non-entry in european phase