GB2283358A

GB2283358A - Disk filing system

Info

Publication number: GB2283358A
Application number: GB9419384A
Authority: GB
Inventors: Neil Andrew Harris; Paul Bamborough
Original assignee: LIGHTWORKS EDITING SYSTEMS Ltd
Current assignee: LIGHTWORKS EDITING SYSTEMS Ltd
Priority date: 1993-09-24
Filing date: 1994-09-26
Publication date: 1995-05-03
Also published as: GB9419384D0

Abstract

A disk filing system for reading blocks of data from a disk, comprising the steps of requesting the data blocks in a first order; determining whether moving to a first block to be read, involves passing through or near to a second block which is further down said order; in the event that such a second block is identified, and that the consequent delay in reading the first block is acceptable establishing a second order in which the second block will be read before the first block; and subsequently reading the blocks in accordance with said second order. Optionally the system has a facility where there is provided a method for reading blocks of data from a disk, wherein a cache and a read ahead facility are provided whereby when a request is received for data from a first block, one or more adjacent blocks may be transferred to the cache so that in the event of a subsequent request for data from one of said adjacent blocks the data is read from the cache rather than from the disk, in which the read ahead is selectively activated and that requests for data include a command as to whether or not the read ahead facility is to be active. The system may also incorporate a method for transferring digital data from a disk to a non-memory mapped device in accordance with a request from a client task, wherein the task issues a command to a disk filing system for data to be transferred from a disk to the address of a non-memory mapped device, and the disk filing system transfers the data from the disk to the non-memory mapped device directly without passing through a buffer controlled by the client task.

Description

Disk Filing System This invention relates to a disk filing system for use with digital equipment such as computers. More particularly the invention is concerned with a disk filing system with improved performance in handling files in demanding contexts such as video image handling.

At present the most cost effective way of storing bulk data in non-volatile Random Access Memory is the magnetic disk. Various types of disk are available, such as the floppy disk which may now hold up to 2.8 Mb of data, the "floptical" disk which may hold up to 20 Mb or more, and the hard disk which may hold up to 2Gb with access times as low as a few milliseconds. There are also re-writable optical disks and of course Read Only Memory media such as CD-ROM's. With all of these systems it is desirable to have access times reduced to a minimum but there are restraints.

In the case where a system writes data to the disk, it is frequently the case that a file will not be stored in a contiguous area of disk space. Data clusters are stored in available space and the system keeps a track of where the clusters are for a particular file. This means that even for accessing a single file a number of read operations are necessary at a number of locations on the disk, increasing access time. Defragmentation utilities may be used at intervals, but these provide only a temporary solution.

Furthermore, there are circumstances where it is necessary to access a number of different files in quick succession, for example if displaying graphics frames in sequence in real time.

It is therefore desirable to improve the efficiency of access to stored data on media such as hard magnetic disks.

Various known systems exist for improving access times to data on hard disks. For example, caching may be used.

In this, information read from the disk is stored in a region of memory set aside as a disk cache. If the information is required again, then it will be read directly from memory rather than from the disk. A read ahead facility may be used, in which the system assumes that the next block of data required will be the next block in the file. Accordingly that data is stored in cache and if a request is made for it then it will be accessed quickly from the cache. A disadvantage is that if the assumption is wrong, unnecessary data will have been stored in the cache. Thus whilst there is sometimes an increase of speed in accessing data, there may be a certain degree of inertia.

Viewed from one aspect of the present disclosure there is provided a method for reading a plurality of blocks of digital data from a disk, comprising the steps of requesting the data with the blocks in a first order; determining whether moving to a first block to be read, in accordance with said order, involves passing through or near to a second block which is further down said order; in the event that such a second block is identified, and that the consequent delay in reading the first block is acceptable in accordance with established criteria, establishing a second order in which the second block will be read before the first block; and subsequently reading the blocks in accordance with said second order.

Viewed from another aspect of the present disclosure there is provided a method for reading a plurality of blocks of digital data from a disk, comprising the steps of requesting the data with the blocks in a first order; allocating priority information to each of the blocks; establishing a second order for reading the blocks, in which the priority information is taken into account; and subsequently reading the blocks in accordance with said second order.

In a simple arrangement, the priority rating could simply be an identifier indicating whether, for example, particular data was of primary or secondary priority.

The blocks could then be ordered in order of priority.

Such a system could be combined with the first inventive aspect referred to above, so that for example all blocks in a first priority category are then re-ordered to optimise access times, followed by all blocks in a second priority category.

Preferably, however, a more sophisticated system is employed. In this, each block of data is allocated a time (generally calculated from the moment of the request) within which reading must be started or completed. In general the completion time will be specified. By knowing also the (approximate) time which will be taken to complete reading a block once started, it is possible to optimise the reading of the blocks by putting them into a useful order. Obviously, knowing the time for completion of a particular block, it is possible to optimise conditions whether one specifies the commencement or finishing time of that particular block.

Thus viewed from another aspect of the present disclosure there is provided a method for reading a plurality of blocks of digital data from a disk, comprising the steps of allocating to each of the blocks information which indicates a maximum permitted time for commencement or completion of reading; establishing an order for reading the blocks, in which the information is taken into account; and subsequently reading the blocks in accordance with said order.

In a simple arrangement, it may be established that any data block must be read within a fixed period of, say, 60 seconds from a request or that all reads must be commenced within a fixed period. Preferably, however, a time is allocated to each block depending upon its particular circumstances such as the importance of the data and the use to which it is to be put.

For example, in editing of video frames held as digital data a situation may be encountered in which at a timecode value of t=30 (say) it is decided to cut to another scene at timecode value t=50. However, in this case the video clips are being played at double speed.

Thus the time available to get the clip is (50-30)/2 = 10 seconds. Thus the time for completion of reading is 10 seconds. Of course, play can be backwards or forwards. Viewed generally, the time available is equal to the modulus of the difference between event time and present time, divided by the (relative) play speed. It will be appreciated that the present system is of particular advantage in non-linear video editing systems where there is a built in timecode system and all events have an associated timecode value. In such systems, files can be a considerable size, such as 100 Mb or more and indeed the present disk filing system is of use in other applications where large files are stored, whether of a graphics type or otherwise.

There will of course be circumstances in which it is impossible to achieve completion of reading within the time specified. There are then various options. For example, some existing data requests may be flagged as non-critical, even if a nominal time has been allocated to them. It may be possible to bypass these temporarily.

It may be desired simply to tell a user that a request cannot be complied with in the time necessary, either providing the data late or abandoning the request completely.

As noted above, in accordance with an aspect of the present disclosure there is provided a method for reading a plurality of blocks of digital data from a disk, comprising the steps of requesting the data with the blocks in a first order; determining whether moving to a first block to be read, in accordance with said order, involves passing through or near to a second block which is further down said order; in the event that such a second block is identified, and that the consequent delay in reading the first block is acceptable in accordance with established criteria, establishing a second order in which the second block will be read before the first block; and subsequently reading the blocks in accordance with said second order.

It will thus be appreciated that in accordance with this aspect of the disclosure an attempt is made to optimise disk access by reducing the amount of time which has to be taken in seeking back to read blocks which, in fact, have already been passed over. Avoidance of unnecessary alternation between backwards and forwards seeks improves performance. This aspect has been defined by reference to a simple system in which only two blocks are specified. In practice, of course, a greater number of blocks will be read and there may be complexities in establishing the optimum order. It is possible that an optimum order in respect of a small number of requests will be altered in the event that other requests are added.

As specified above, the system does not simply involve placing the blocks in an order for optimum speed of access judged in terms of all the blocks taken together. In accordance with the present aspect of the disclosure, attention is paid to whether there would be an unacceptable delay in accessing a block which is being skipped temporarily.

In the simplest system, it may be the case that any delay in accessing a block may be acceptable. For example, if all the blocks relate to the same file or operation to be carried out, then it may not matter which blocks are read first. By carrying out the system described above there will be an overall reduction in time to have all blocks read. In such an arrangement, the criteria may be such that any delay in reading a particular block is acceptable, providing the overall time to read all the blocks is improved. The system therefore simply optimises the order of reading to avoid alternation of forwards and backwards seeks.

In one possible arrangement, some of the aspects described above are combined, so that a first order is set using the time data and the order is then reconsidered in accordance with the efficiency of moving between blocks, whereby a second order is established whilst still ensuring reading of the blocks within their allocated times.

Thus, viewed from a further aspect of the disclosure there is provided a method for reading a plurality of blocks of digital data from a disk, comprising the steps of allocating to each of the blocks information which indicates a maximum permitted time for commencement or completion of reading; establishing a first order for reading the blocks, based on the order of expiry of the maximum permitted times; determining whether moving to a first block to be read, in accordance with said first order, involves passing through or near to a second block which is further down said order; in the event that such a second block is identified, establishing whether reading said second block before said first block means that reading of the first block will exceed its maximum permitted time; in the event that said maximum permitted time for the first block would not be exceeded, establishing a second order in which the second block will be read before the first block; and subsequently reading the blocks in accordance with said second order.

In practice a more complex situation may be encountered, with three, four or more blocks to be considered. In such cases, various permutations of orders will be considered to determine whether there is an optimum order in which the maximum permitted time for any particular block is not exceeded, whilst the overall time is minimised.

Viewed from another aspect of the disclosure, there is provided a method for reading a plurality of blocks of digital data from a disk, comprising the steps of allocating to each of the blocks information which indicates a maximum permitted time for commencement or completion of reading; establishing an order for reading the blocks in which the overall time for completion of reading all of the blocks is minimised by reduction or elimination of alternation of forwards and backwards seeks, whilst said maximum permitted time for any particular block is not exceeded; and reading the data blocks in accordance with said order The advantages of such a system can be seen by reference to the following examples. In both cases, requests are made at times t = 0, 1 and 2 for data blocks 1000, 2000 and 100,000 respectively. The maximum permitted times are respectively 10, 40 and 30 seconds.

In Table 1, the data is read simply in order of maximum permitted time and the total time elapsed is t = 13. in Table 2, the read order has been adjusted to prevent the long seek out to block 100,000 followed by the long seek back. All reads are completed within their maximum times but the overall time is reduced to t = 11.

Table 1

t=0 Start t=1 Reach block 1000 and start transfer of data t=3 Finish transfer, start seek to block 100,000 t=6 Reach block 100,000 and start transfer of data t=8 Finish transfer, start seek to block 2,000 t=11 Reach block 2,000 and start transfer of data t=13 Finish transfer Table 2

t=0 Start t=1 Reach block 1000 and start transfer of data t=3 Finish transfer, start seek to block 2,000 t=4 Reach block 2,000 and start transfer of data t=6 Finish transfer, start seek to block100,000 t=9 Reach block 100,000 and start transfer of data t=ll Finish transfer Thus, by using the above system it is possible to have a disk filing system which anticipates what information is going to be required from disk, to calculate when that information is required and to plan for maximum efficiency of transfer of the information.

As noted earlier the system can determine whether a seek to one block will involve passing through or close to another block. The system may only permit reading of an "intermediate" block if it is passed through directly.

However, in a preferred embodiment the system permits deviation by a maximum possible amount to nearby blocks.

This permitted deviation can be a tuneable parameter, which can be altered dependent on many factors such as the relative seek time and data transfer rate of the disk being used. Of course, when the maximum permitted deviation becomes large then the efficiency will be lowered.

In use of the system, the order in which reads are carried out may be updated every time a data request is received. the system will naturally take into account outstanding requests not yet carried out and the time elapsed since those requests were made, when fitting a new request into the schedule.

The disk filing system may include other features, which are themselves inventive.

The system may therefore incorporate a "read ahead" facility, which is selectable. Thus, a client task requesting information from the disk filing system may specify whether read ahead would be advantageous. The task may know that there is likely to be a request for data from the next block, or that there is more likely to be a request for a different block. In the latter case, the read ahead facility will not operate and unwanted data will not be stored in the cache. It is a matter of choice in implementation as to whether there is a default to read ahead, which is cancelled at the request of the client task, or whether the client task activates read ahead.

Accordingly, viewed from another aspect of the disclosure, there is provided a method for reading a plurality of blocks of digital data from a disk, wherein a cache and a read ahead facility are provided whereby when a request is received for data from a first block, one or more adjacent blocks may be transferred to the cache so that in the event of a subsequent request for data from one of said adjacent blocks the data is read from the cache rather than from the disk, characterised in that the read ahead is selectively activated and that requests for data include a command as to whether or not the read ahead facility is to be active.

The system may also incorporate a novel feature for transferring data from disk to a non-memory mapped device such as a graphics display adaptor. Normally, a large proportion of transfers are from disk to a buffer space within the client task. The client task will have relative addressing, such that for example the buffer starts 1000 bytes into the task. However, the operating system for the computer will position the task within the available addressing space in the total memory, which may be many megabytes. Thus it is a function of the operating system to convert the relative addressing of the task buffer. However, non-memory mapped devices such as graphics adaptors have fixed addresses. A further inventive aspect of the present disclosure exploits this to accelerate the transfer of data to such devices.

Thus according to a further aspect of the disclosure there is provided a method for transferring digital data from a disk, wherein data is read from the disk in accordance with a request from a client task and, in the event that the data is to be supplied to a non-memory mapped device, the data is supplied directly to that device.

Viewed from another aspect of the disclosure there is provided a method for transferring digital data from a disk to a non-memory mapped device in accordance with a request from a client task, wherein the task issues a command to a disk filing system for data to be transferred from a disk to the address of a non-memory mapped device, and the disk filing system transfers the data from the disk to the non-memory mapped device directly without passing through a buffer controlled by the client task.

The non-memory mapped device could be a graphics adaptor, or e.g. a port connected to a printer or a modem.

The system for direct transfer of data is of particular importance in situations where large image files are to be transferred to a non-memory mapoped graphics adaptor which controls the display on a monitor. Such circumstances arise in the context of video editing techniques.

Obviously, the data may be supplied from the disk to a cache before being passed on to the device. The important feature is that it is not necessary to pass the information through the client task and a buffer.

The disk filing system recognises that the address is not memory mapped and that it can supply the data directly. A special handler is used to accomplish these transfers and is loaded at startup.

As noted earlier, the preferred disk filing system plans its activities and calculates whether there is sufficient time to carry out tasks. It may not even start to try reading data if it knows that it will not be able to complete the task in an allotted time. The system may work interactively with a client task, by passing to it an estimate of the time to be taken for the data to be read. The task can then decide whether or not it is worthwhile performing the read and of course can itself seek instructions from and/or provide a status report to a user.

The system may also incorporate the ability to receive a command from a client task to the effect that certain data is to be read irrespective of the time taken.

Furthermore, there may be a facility for blocking and non-blocking transfers. With a blocking transfer, control is not handed back to the client task until the request for and transfer of data have been completed.

With a non-blocking transfer, control is handed back to the client task whilst transfer takes place.

A further preferred feature of the disk filing system is the use of a common code module for reading from disk, whether the data is indeed true data in a file or is directory information or is special data such as the boot block of the disk or the file allocation table ("FAT"). By keeping the disk filing system module as small as possible, it can be executed in less memory, resulting in lower system overheads. A feature is that all accesses appear to a client task as being file oriented. In a conventional system it would be necessary to access, for example, the FAT tables by finding out the absolute disk block number and reading a block at that number. In the preferred system disclosed herein it is possible to open a FAT table as if it were a file, and to refer to different blocks as if they were in a contiguous file, which physically they may not be.

In a preferred form the disk filing system is compatible with a proprietary operating system such as MS-DOS (Trade Mark) and supports disks formatted under that system. This enables commercially available utilities to be used for disk diagnostics, maintenance, crash recovery and so forth.

In a preferred embodiment the table of data requests is not just held in a transient state in memory but is also written as a file to a disk. This will enable the file to be opened and the order inspected, in case of any difficulties so as to facilitate troubleshooting. The file may be overwritten each time a new table is created or may remain available for a certain period before being purged.

The system may also incorporate a provision for handling multiple disks. This can be of particular importance in handling large video files which could not be stored on a single disk. In such an arrangement, each disk would need to be provided with its own seek management system which analyses requests, calculates times and works out the table for the order in which the blocks are read.

However, there must be an overall file manager which receives the requests from a client task, directs each request to the appropriate disk, and analyses the replies from the individual seek management systems.

This enables the overall file manager to determine whether a particular request can be complied with in a specified time, taking into account what will be happening on each of the disks.

A multiple disk system could also be used to split files up deliberately when data is being written. For example, the first five blocks of a file could be stored on one disk, the next five on a second disk, the next five on a third disk, the next five back on the first disk, and so forth. Thus, if a request is received for fifteen consecutive blocks, the three disks can be working simultaneously to retrieve them, thus reducing access times.

Rather than have only a split file, it might be desirable to have a complete file on one disk, and then copy some parts to the second and some to the third.

This still enables the system to operate as described above, but there is the security of a complete copy of the file on one of the disks. In use of such a system, the overall file manager may determine whether the optimum location from which to read the data is on the basic disk or on one of the disks having a copy.

Whilst various examples and preferred aspects have been set out above, it will be appreciated that many variations are possible. The various aspects described are particularly useful in the handling of large image files but they may be used in other contexts.

Protection is sought for the various aspects of the disclosure referred to above and for modifications thereof.

A preferred system embodying some of the inventive aspects of the disclosure will now be described by way of example.

The particular system now described relies upon allocating a deadline to a data request. In the first routine the deadline is allocated and is referred to as a METRIC. The data to be read or written is an EXTENT, which is a variable sized but contiguous area of data.

The system is for use in a video editing system where sequences of frames are requested, and the speed and direction of play have to be taken into account.

ROUTINE 1 /* Read in 1 or more read-pending segments */ void dtx read segments(void) dtx~seginfo *entry; double play~time, play~speed; /* Is there nothing to do? */ if (!dtx~active()) return; /* If there is no free transfer buffer, a speculative read would seem like a bad idea. Wait until later, and get on with something else */ if (dtx~xbuf~find~free() == NULL) // dtx trace msg("read~segments: No free xfer buffer. Bouncing off."); dtx~xbuf~poll(); return; /* Are there any entries on the prefetch list? */ entry = dtx segQ get head( & dtx~RPEND queue); if (entry == NULL) return; // A diagnostic // printf("%d entries on Q\n", dtx~get~RPEND~count()); /* Get in current play-time and play-speed */ play~time play = play~get~time(); play~speed = play~get~speed(); /* Clear the scan-flag in each element of the prefetch list, and calculate its metric (deadline), based on edit-time, play-time AND PLAY SPEED.

Note: For purpose of calculating deadlines clip 'play~speed' to +/- 1.0 MINIMUM.

This will cause serviceing of requests AS IF playing at speed to preempt sudden acceleration to +/lx play speed.

Assuming loaders have issued some pre-emptive requests It doesn't much matter which way this flips at speed = 0.0 as requests with negative deadlines are now serviced if time allows (which it will at -0.0 speed).

*/ signed int play~direction = (play~speed < 0.0) ? -1 +1; double clipped~play~speed = play~speed; if (fabs(clipped~play~speed) < 1.0) clipped~play~speed = (play~direction > 0) ? 1.0 -1.0; entry = dtx~segQ~get~head( & dtx~RPEND~queue); while (entry != NULL) /* How soon IN REAL TIME is this entry NEEDED? */ /* The deadline (edit time) has to be adjusted to reflect the fact that the stored time is centered on the extent. It is adjusted forwards to give THE LATEST edit time that could possibly be covered by the extent. This causes the fetcher not to view the extent as missed when it is infact only the midpoint that has passed. This is important at low speeds.

This has an undesirable effect that it produces a deadline that is artificially delayed. However in practice this is not a problem as all extents get the same bias and are fetched sufficiently ahead of time for it not to matter.

*/The slop adjustment is a minor adjustment.

&num;define dtx~deadline~slop (4.0/24.0) /* 4K &commat; 24K/sec */ double slop~adjustment = deadline slop * play direction; entry- > metric = (entry- > edit~time + slop~adjustment - play~time) / clipped~play~speed; entry- > rpend~scan~flag = 0; /* Bump to next entry */ entry = dtx~segQ~get~next( & dtx~RPEND~queue, & entry- > RPEND); /* Clear out the extent table */ extent table level = 0; /* For each entry on the read-prefetch Q, try to find an extent from that segment */ entry = dtx segeget head( & dtx RPEND queue); while (entry != NULL) /* Unless entry has already been scanned, call dtx find extent from() to inspect its eligibility for pre-fetch */ if (!entry- > rpend~scan~flag) dtx~find~extent~from(entry); entry = dtx~segQ~get~next( & dtx~RPEND~queue, & entry- > RPEND); #if 0 printf("%d extents in list\n", extent~table~level); for (int i=0; (i < 5j & & (i < extent~table~level); i++) printf("found extent %d %d %lf\n", extent first[i]- > offset, extent last[i]- > offset, extent~metric[ii); &num;endif /* Get the best extent in the list (if any) */ DtxExtent *p~best~extent = find~best~extent(); if (p~best~extent != NULL) read extent(p best extent); The Extent Table referred to is a table which stores the starting block number, the Extent Size in bytes, and the Metric. There may be a software pointer (pointer p extent) to identify the extents or the table could store an extent number. The table would normally also contain the disk drive numbers for requests.

As regards the extents, these could be made up of two or more Segments. Thus, if a first request is made to read blocks 100 to 200, and a second request is made for blocks 150 to 250, the system will consolodate these into a single extent of blocks 100 to 250. Each Segment may have its own critical time, and thus that for the combined extent will be the worst case.

In practical terms the Metric can be thought of as the real time available before the data is required. For example, if the current position on the video is 20 seconds (timecode) and the start of the next scene is to be positioned at 50 seconds, and playback is at 2x speed, then the time available is (50-20)/2 = 15 seconds.

The second routine establishes the order for reading the extents. This routine refers to latency. Every system has an inherent latency, i.e. a minimum time to fetch anything. This is generally a constant although it may be speed dependent at low speeds. The metric has to be compared with the latency to determine whether the task can be completed at all. Thus, if an extent is required in 5 milliseconds and the system latency is 10 milliseconds, then there is no point in trying to get the extent as it will be too late.

Another feature in this routine ensures that effort is not expended upon getting extents which are not required for MEETABLE approaching deadline. If none are approaching then it is the nearest receeding deadline.

An allowance is also made to avoid fetching data that won't arrive before its play time due to read latency.

Finally, extents not required for ages in the future are ignored improving a chance for further extent aggregation.

Logic could be improved to favour elevator seeking if a number of extents on same drive have similar deadlines.

*/ &num;define dtx~read~latency (0.050 + 0.250 + 0.200) /* Seek + read + xfer 256K */ double read~latency = dtx read latency; /* At play speed < +/-1.0 reduce latency porch as special case to get extents with artificially tight deadlines loaded.

*/ double abs~play~speed = fabs(play~get~speed()); if (abs~play~speed < 1.0) read~latency *= abs~play~speed; /* Assume first extent is best unless find one better */ DtxExtent *p~best~extent = & extent~table0]; double best~metric = p~best~extent- > metric; DtxExtent *p~extent = & extent~table[1]; for (int i = 1; 1 < extent~table~level; i++, p~extent++) { /* Is this extent an even better candidate? */ double metric = p~extent- > metric; if ( ((best~metric < read~latency) & & (metric > best~metric)) I I ((best~metric > read~latency) & & (metric > read~latency) & & (metric < best~metric)) /* Register this extent and metric as the best yet */ p~best~extent = p~extent; best~metric = metric; /* Is even the best extent deadline close enough to fetch? */ if (dtx~prefetch~threshold < 0.0) /* Cache threshold value first time here */ dtx~prefetch~threshold = config double("dtx prefetch threshold, 2.0); if (dtx~prefetch~threshold < 0.0) dtx prefetch threshold = 0.0001; if (fabs(best~metric) > dtx prefetch threshold) { /* Even best extent is not near enough to be worth fetching */ return (NULL); return(p~best~extent); In the above system, the extents are ordered in accordance with the metrics, which are the times by which the read must be completed. However by redefining the metrics, alternative systems could be implemented.

For example, the metrics could be related to the distance of a block from another. In that case, the order of the extents will be made to depend upon optimising the system to avoid unnecessary forwards and backwards seeks.

A third routine deals with the possibility of selective read ahead. In this routine there are defined the first byte in the prefetch, and the first subsequent byte not in the prefetch. If these are set as the same, then there are no bytes to be dealt with in the prefetch and effectively the read ahead feature is switched off.

ROUTINE 3 /*Specify data to prefetch*/ int dtx read~prefetchl(int fh, long byte~ptr, long xfer bytes, double edit~time) { dtx~fileinfo *fi; long start~ptr; /* first byte in prefetch */ long end~ptr; /* first byte NOT in prefetch */ long sector~map[DTX~XFER~MAPSIZE]; int i, map~size; /* Emulation mode -- there is no DOS equivalent, but we report success anyway */ if (dtx is dos handle(fh)) return(0); /* Allow any pending transfer buffers to complete */ dtx~xbuf~poll(); /* Validate the file handle */ fi = dtx~get~fileinfo(fh); if (fi == NULL) dtx~splat("dtx~read~prefetch: Bad file handle."); if (fi- > is~special) return(0); /* 'Success', but ignored */ &num;if DTX~DEBUG printf("dtx read prefetch: file %s, offset %ld, size %ld, deadline t.2Lf\n", fi- > fname, byte~ptr, xfer~bytes, edit time); &num;endif /* Work out prefetch area in file */ if (xfer~bytes > 0) { start~ptr = byte~ptr; end ptr = byte~ptr + xfer bytes; else dtx~splat("-ve transfer size in read-prefetch"); /* Clip prefetch area at both ends NB directories have no length of their own -- EOF is detected by mapping failure */ if (start~ptr < 0) start~ptr = 0; if ((!fi- > is~directory) & & (end~ptr > fi- > length)) end~ptr = fi- > length; xfer~bytes = end~ptr - start~ptr; /* If entire transfer is clipped out, we do nothing and return success if (xfer~bytes < = 0) return(0); /* build a map of this transfer's sectors */ map~size = dtx~map~multiple~sectors(fi, start~ptr, xfer~bytes, sector~map, DTX~XFER~MAPSIZE); if (map~size < = 0) dtx warn("read prefetch: bad mapping"); return(-1); /* If any sector is badly mapped, do not attempt prefetch */ for (i=0; i < map~size; i++) if (sector~map[i] < 0) dtx warn("read~prefetch: bad sector mapping"); return (-1); /* Anything to read? Not a warning if file is a directory: see above */ if ((map~size < = 0) & & !fi- > is directory) herc~printf("read~prefetch: nothing to read: map size %d\n", map~size); return(0); /* Put these sectors on the read-prefetch queue, by simulating a disk read operation */ prefetch~disk~to~cache(fi- > diskinfo, sector~map, map~size, xfer~bytes, edit time); /* Now call the polling routine, as it may have something new to do */ dtx poll(); return(0); /* Success */ A fourth routine deals with non-memory mapped devices.

Such a device is treated as "magic" and data can be written to it directly. Data is sent in the normal manner by a memory to memory move for non-"magic" devices.

ROUTINE 4 /* Table of selector handlers */ typedef struct unsigned short selector; void (*rd~handler) (void *dst, void *src, long bytes); /* From selector */ void (*wr~handler) (void *dst, void *src, long bytes); /* To selector */ } DtxSelInfo; /* NOTE: Segment selectors for 'magic copy' operations */ /* 386/486 segment selectors look like bits 0:1 requestor privilege level bit 2 table indicator 0 = > LDT 1 = > GDT bits 3:15 arbitary index We choose arbitary selectors that (1) are in privilege level 0 (2) look as though they are in the GDT i3i do not clash with any of the Phar Lap selectors static DtxSelInfo selector~tbl[] = { { OxF4, NULL, NULL }, 0xF6, NULL, NULL 1; static int selector~free = 0; /* Next free entry */ unsigned short dtx~magic~sel~alloc void (*wr~handler) (void *dst, void *src, long bytes), void (+rd~handler) (void *dst, void *src, long bytes) DtxSelInfo *sel~info; /* Allocate a magic selector and descriptor */ if (selector~free > = DIM(selector~tbl)) dtx~splat("No magic selectors available"); sel~info = & selector~tbl[selector~free++]; /* Fill it in */ sel~info- > wr~handler = wr~handler; sel info- > rd handler = rd handler; /* Return the selector value */ return sel~info- > selector; void dtx~farmemcpy(void far *dst, void far *src, long bytes) { dtx~farmemcpy~evt.start(); if (bytes < 0) dtx~splat("attempt to copy -ve length"); #ifdef DOS386 &num;if 0 /* Slow, but useful, checks for debugging */ if ((FP~OFF(dst) == 0x34) & & dtx~rbuf~bounds~check(FP~OFF(dst), bytes)) dtx~splat("attempt to write outside real-mode buffer"); if ((FP~OFF(src) == 0x34) & & dtx~rbuf~bounds~check(FP~OFF(src), bytes)) dtx~splat("attempt to read outside real-mode buffer"); &num;endif /* Is there a magic handler for either selector? Note: Must be FAST check */ DtxSelInfo *sel~info = & selector~tbl[0]; for (int ix = 0; ix < selector~free; ix++, sel~info++) if (sel~info- > selector = FP SEG(dst)) /* Call write handler */ if (sel~info- > wr~handler != NULL) sel~info- > wr~handler((void *)dst, (void *)src, bytes); dtx farmemcpy evt.end(); return; else if (sel info- > selector == FP~SEG(src)) /* Call read handler */ if (sel~info- > rd~handler != NULL) sel~info- > rd~handler((void *)dst, (void *)src, bytes); dtx farmemcpy evt.end(); return; /* If get here then use standard men to mem move movedata(FP~SEG(src),FP~OFF(src),FP~SEG(dst),FP~OFF(dst) bytes); &num;else memcpy(dst, src, bytes); &num;endif dtx farmemcpy evt.end(); Ithus the present disclosure provides a disk filing system which has improved routines for reading data from a disk, in which requests for data are prioritised and in which a request for data is allocated a maximum permissible time for retrieval.

Although particular embodiments incorporating inventive aspects of the disclosure have been described, many other variations and modifications and other uses will be apparent to those skilled in the art. The expression "disk" extends to other storage media which are not physically disks, in which the teachings of the disclosure will be relevant.

Claims

1. A method for reading a plurality of blocks of digital data from a disk, comprising the steps of requesting the data with the blocks in a first order; determining whether moving to a first block to be read, in accordance with said order, involves passing through or near to a second block which is further down said order; in the event that such a second block is identified, and that the consequent delay in reading the first block is acceptable in accordance with established criteria, establishing a second order in which the second block will be read before the first block; and subsequently reading the blocks in accordance with said second order.

2. A method as claimed in claim 1 including the step of allocating priority information to each of the blocks; and in which the priority information is taken into account when establishing the second order for reading the blocks.

3. A method as claimed in claim 1 or 2, comprising the step of allocating to each of the blocks information which indicates a maximum permitted time for commencement or completion of reading; and in which such information is taken into account when establishing the second order for reading the blocks.

4. A method as claimed in claim 1, 2 or 3, wherein a cache and a read ahead facility are provided whereby when a request is received for data from a first block, one or more adjacent blocks may be transferred to the cache so that in the event of a subsequent request for data from one of said adjacent-blocks the data is read from the cache rather than from the disk, and wherein the read ahead is selectively activated and requests for data include a command as to whether or not the read ahead facility is to be active.

5. A method as claimed in any preceding claim, wherein data is read from the disk in accordance with a request from a client task and, in the event that the data is to be supplied to a non-memory mapped device, the data is supplied directly to that device.

6. A method as claimed in claim 5 wherein the task issues a command to a disk filing system for data to be transferred from the disk to the address of the nonmemory mapped device, and the disk filing system transfers the data from the disk to the non-memory mapped device directly without passing through a buffer controlled by the client task.

7. A method as claimed in claim 1, comprising the steps of allocating to each of the blocks information representative of a time within which the read must be completed; and establishing the second order for reading the blocks, in which such information is taken into account.

8. A method as claimed in claim 7, wherein a threshold value is set and if the time within which the read must be completed is beyond that threshold then the read operation is delayed.

9. A method as claimed in claim 7 or 8 wherein a latency value is set and if the time within which the read must be completed is greater than the latency value then the read operation is not attempted.

10. A method as claimed in any preceding claim, in which a request is made for a first segment, a request is made for a second segment, and in the event that the first and second segments are contiguous or overlap, a combined segment incorporating all of the data of the first and second segments is made the subject of a single read operation.

11. A method for reading a plurality of blocks of digital data from a disk, comprising the steps of requesting the data with the blocks in a first order; allocating priority information to each of the blocks; establishing a second order for reading the blocks, in which the priority information is taken into account; and subsequently reading the blocks in accordance with said second order.

12. A method for reading a plurality of blocks of digital data from a disk, comprising the steps of allocating to each of the blocks information which indicates a maximum permitted time for commencement or completion of reading; establishing an order for reading the blocks in which such information is taken into account; and subsequently reading the blocks in accordance with said order.

13. A method for reading a plurality of blocks of digital data from a disk, comprising the steps of allocating to each of the blocks information representative of a time within which the read must be completed; establishing an order for reading the blocks, in which such information is taken into account; and subsequently reading the blocks in accordance with said order.