WO2004109424A2 - A novel file system - Google Patents

A novel file system Download PDF

Info

Publication number
WO2004109424A2
WO2004109424A2 PCT/IN2004/000110 IN2004000110W WO2004109424A2 WO 2004109424 A2 WO2004109424 A2 WO 2004109424A2 IN 2004000110 W IN2004000110 W IN 2004000110W WO 2004109424 A2 WO2004109424 A2 WO 2004109424A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
language
command
format
file
Prior art date
Application number
PCT/IN2004/000110
Other languages
French (fr)
Other versions
WO2004109424A3 (en
Inventor
Vinayak K. Rao
Original Assignee
Vaman Technologies (R & D) Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vaman Technologies (R & D) Limited filed Critical Vaman Technologies (R & D) Limited
Publication of WO2004109424A2 publication Critical patent/WO2004109424A2/en
Publication of WO2004109424A3 publication Critical patent/WO2004109424A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers

Definitions

  • the present invention relates to computer processing systems and to a file system for storing and retrieving information.
  • a file system that controls how files are named and stored on physical media to enable easy storage and retrieval.
  • a file system from a macro view is a collection of files and directories and the operations available to be performed on them. Since the origin of long-term storage media much effort has been made to create file systems that are easily accessible and that facilitate manipulation of files.
  • Applications send data in the form of a byte stream to an operating system.
  • This byte stream contains information associated with the file information being sent by the applications.
  • the information may comprise of the filename, the size of file, the extension, the pattern of file usage with respect to the operating system functionality, the identity of applications associated with the file, and the like.
  • the operating system divides the byte stream received into chunks of data that are stored on media.
  • the media is divided into sectors and clusters, with a number of sectors comprising a cluster.
  • the byte stream is stored in clusters. This entire process is carried out by the operating system, which stores these byte streams into entities termed as files.
  • the byte stream may either contain data regarding a new file that is to be created and stored on the media or may contain information regarding an already existing file that needs to be updated, deleted or modified.
  • Fig. 1 illustrates a block diagram of the physical interpretation of a hard disk drive (HDD) on which the physical file system is stored.
  • the HDD comprises of platters 100,105,110 mounted over a central spindle 115.
  • the Disk heads 120,125,130,135,140,145 that are positioned on both sides of each platter are capable of being repositioned for reading and writing.
  • the OS divides the storage media such as HDD, FDD etc into concentric tracks 150,155,160. These tracks 150,155,160 are divided to form sectors 165.
  • a collection of such sectors 165 from various consecutive platters forms the cylinder 170.
  • a collection of clusters is called a file.
  • the physical sector grouping is OS specific which means that the number of sectors in a cluster is different on different operating systems.
  • Fig. 2 is a block diagram showing how a legacy file system organizes ad stores files and file related data.
  • Each operating system follows its own proprietary protocol of managing byte streams of data in an effort to attain optimal performance and resource utilization.
  • Each OS vendor utilized their own proprietary file system that is optimized based on OS specific requirements.
  • Raw HDD 200 illustrated in Fig. 2 is partitioned into ⁇ C:> 210 and ⁇ D:> 215 using a command such as 'fdisk' 205.
  • the C: partition 210 can then be formatted using a 'FORMAT' command.
  • the formatting procedure creates a ROOT 220, FAT 225 and user space 230 on the formatted partition.
  • the ROOT partition comprises of 220 file details for example, the size of the file, the file name, user privilege attributes and the like.
  • the File Allocation Table (FAT) is a table that contains information showing where the file is stored. It provides file specific pointers to the sector on the hard disk where the corresponding file begins.
  • the HDD 235 also includes user space 230 for collection and storage of actual data. Such user space can either be used space 240, i.e., space that has already been allocated to files, or free space 245, i.e., space that has yet to be allocated to files.
  • the operating system When a user accesses a file, the operating system references information available in the ROOT and FAT and then traverses through the clusters to retrieve file data contained in the user space.
  • the files can be stored into a logical entities called directories.
  • the directories act as containers for the files within them.
  • the directory file structure is analogous to a parent child relation and this logical relation, with respect to physical media storage comprises of what is popularly known as a file system.
  • FAT32 implemented in Windows 98 from Microsoft Corporation
  • HPFS implemented in OS/2 from IBM
  • NTFS implemented in Windows NT from Microsoft Corporation
  • the FAT32 file system is the first 32-bit file system and was implemented by
  • Windows 98 In addition to integrated 32-bit file and disk access, the structure on the disk was basically the same as a previous file system called DOS v4. Most of the Windows 98 OS properties were similar to the DOS v4 OS to facilitate backward compatibility. Windows 98 was a single user, multitasking system. It lacked multi- user support and an important feature of the Windows operating systems called 'scandisk'. 'Scandisk' was not included on Windows 98 because 'scandisk' was a 16-bit program which could not process volumes using the FAT32 file system that has a FAT larger than 6 MB or less than 64 KB in size.
  • HPFS file system Another file system called the HPFS file system was developed for the OS/2 operating system provided by International Business Machines ('IBM'). This file system addressed several shortcomings of previous file systems and included advantages such as optimized file system performance and the ability to recover from more severe faults than other file system existing at set time.
  • the HPFS implementation required high processing power, high memory and large disk space, making it difficult to use and execute applications on the OS/2.
  • the HPFS file system lacked good security and efficient search functionalities of the files stored on the media.
  • NTFS New Technology File System
  • legacy OSs only include primitive commands such as read, write, flush, open and close type commands for file access and manipulation. These commands do not provide the user with much flexibility to manipulate the data stored on the media.
  • NTFS Another problem not adequately addressed by NTFS includes issues arising from fragmentation of files. Since all operating systems use clusters to store files, and cluster size is predetermined, OS's often use more space than is required by a file to store the file on media. This leads to fragmentation of the FAT disk and further reduces file access speed in addition to unnecessary memory wastage. Although this was addressed in OS/2 and Windows NT file systems by searching for the maximum amount of continuous unused space that would be sufficiently large enough to accommodate the file before recording it, the problem was not completely addressed.
  • API Application Program Interfaces
  • operating systems do not have exposed APIs that can be used to store data directly on the file system. This causes application programmers to write applications which store data in their own propriety formats without any definitions regarding the data.
  • Microsoft Word sends its data to the operating system in a different format than does Corel Draw or database applications.
  • the proprietary format of data is translated into file format, i.e., received by the operating system and encapsulated by the operating system into a file.
  • the file is either stored in contiguous sectors on the media or fragmented and stored in parts, based on the amount of contiguous space available. Therefore, even though the data is stored on media, the format of the data stored is proprietary to the applications that create the data.
  • databases use some of the most sophisticated technologies to store, access and modify data.
  • Present day databases use tables and records to store data.
  • databases store data in tables and records. For example a unique hash can be generated for the type of data being received and likewise the data can be saved in a specific record or it can be saved in contiguous sectors in the table itself. These procedures facilitate easy data retrieval and access.
  • the database that fires an SQL query needs to translate the SQL query into a read or write command, which in turn fetches the data from the file system.
  • the intended functionality of the query can only be performed once the data has been fetched by the operating system. This causes a level of translation and often times multiple levels of translation, which substantially slows down the access rate of the files.
  • Fig. 1 is a block diagram of the physical interpretation of the Hard Disk Drive (HDD), called the physical file system.
  • HDD Hard Disk Drive
  • Fig. 2 is a block diagram depicting the logical interpretation of the HDD, called the logical file system.
  • Figs. 3a, 4a and 5a show how legacy file systems 300 are organized and store data.
  • Figs. 3b, 4b, and 5b show how an embodiment of a VFS is organized and stores data
  • Fig. 6 and 7 shows a block diagram and the corresponding flow representation of one embodiment of a VFS of the invention.
  • Fig. 8 is a diagram depicting the usage of command interpretation to SQL that is the operations that can be performed on a file system can be translated to database.
  • Fig. 9 is a diagram depicting the functionality of the "Connect” and "DISCONNECT"
  • Fig. 10 and 11 depicts the various File and Directory commands translated to corresponding database query statements.
  • Fig. 12 illustrates an embodiment of the components required for the working of the VFS.
  • Fig. 13 describes a flow chart representing embodiments of the invention
  • Fig. 14 illustrates an embodiment of the data definition tables.
  • Fig. 15 illustrates a flow chart for translating a command
  • the present invention may be embodied in several forms, structures and manners.
  • the invention pertains to a file system in an operating system and, in particular, pertains to storing and utilizing data in a table structure format that is database compliant. Embodiments of the present invention utilize the searching, storage and other database benefits in the context of an operating system.
  • a database formulated file system is referred to as a Virtual File System ("VFS").
  • VFS Virtual File System
  • the VFS is used in an OS so that the OS file system stores file data in a table structure format, which is a database compliant one and, in an alternative embodiment, an Open Database Connectivity (ODBC) compliant one.
  • the VFS facilitates the use of a table language, i.e., a language that can act on data stored in a table structure format.
  • Table languages such as SQL, are generally functionally rich languages and provide improved data access and manipulation functionalities and performance.
  • One reason that table languages provide for improved functionality and performance is that the table languages act on data stored in table structure format, which typically include tables including records and incorporate hashing functions.
  • the VFS when new data is stored to the VFS, the VFS creates a unique hash, which is linked to a particular cell of the data matrix stored on the media. Data access is, therefore, improved because the number of records to be traversed when locating a file is reduced. For example, to access the record of Mr. John, files beginning with "J" and then "Jo" are searched based on the unique hash, but files beginning with other letters such as "S" are not. Thus, the need to traverse through Mr. Smith's data is eliminated; and, to add another field in Mr. Smith's record, the need to traverse through other fields associated with Mr. Smith is eliminated. This is done using the hash that can be used to directly access the cell required in the data matrix.
  • the operating system stored application files as separate files without hashing function incorporation and, therefore, required traversing through unnecessary files on the media to access or search for a specific application file.
  • legacy file systems store such data in file formats, which can only be acted on by file system languages, such as OQL.
  • database applications naturally send commands to an OS in table languages (such as SQL), the commands must be translated to an OS compliant language to act upon the data, which is stored in a file format. Because the OS languages do not have the flexibility of the table languages, the potential to utilize table language flexibility on table and record formatted data that incorporates hashing functions is lost. Further, the additional translation from SQL to OQL substantially slows file retrieval, storage and manipulation processes.
  • commands received from applications to act upon the data stored in the data matrix are translated to commands in a table language.
  • a third embodiment of the invention involves reprogramming applications using APIs provided by the VFS.
  • the APIs allow users to specify data definitions for data to be stored. These data definitions include, for example, the characteristics that the data has been associated with.
  • the data definition table may include data definition sub- structures (DDSS) such as "Employee Name” and "Employee ID,” and the actual file data can be the names of the employees such as "Mr. Smith", “Mr. John” and employee identifications such as "1234" or "5678".
  • DDSS data definition sub- structures
  • an application directly uses data created by a different application without the use of middleware or clipboards. Improved data sharing is achieved. This enables the application to retrieve data from the media as well as formatting and other like information stored by the other application in the data definition tables.
  • Figs. 3a, 4a and 5a shows how legacy file systems 300 are organized and store data.
  • the hierarchical structure comprises of a Disk 310 (or long term memory), which is typically partitioned.
  • Each partition or sub-disk 315 contains a root 320, which contains basic file specific information such as file name, file size, user privilege information and the like.
  • Most file systems also have a File Allocation Table (“FAT”) (Fig. 4a) that contains pointers for each file that is created.
  • FAT File Allocation Table
  • the pointers point to the starting location 520 of the clusters 515 (Fig. 5a) on which the file is saved.
  • the user space which is the entity where the data resides, is made up of directories 325. Files may be stored in parent directories 325 or in sub-directories 330,
  • the directories 325 and sub-directories serve as a container for holding one or more files.
  • Legacy file systems receive data from applications and translate the file data into a file format, as depicted in Fig. 4a.
  • File information is stored in the root directory entry 405.
  • the Master Table or ROOT 410 contains values such as File ID 415, Name 420, Path 425, Size 430, Attributes 435.
  • the Transaction Table 440 comprises of File ID 445 as well as File Data 450.
  • This File ID 445 is a foreign key that contains pointers to the start of the files stored on the media called the file data.
  • the ROOT puts several constraints on certain fields such as proscribing files having the same name. Information about a file object (that is name, size, date time etc) is saved in the root directory entries with respect
  • data is stored in a file format in a long term memory having a disk by way of a propriety file system.
  • the disk comprises numbered sectors 505. Each sector 505 on the disk is, for example, "512 bytes.” In one legacy system, four sectors comprise a cluster 515, and each cluster is two kilobytes in size 515.
  • the Beginning of the file (BOF) 520 and End of the File (EOF) 525 are 10 kilobytes apart.
  • the file is opened in Read/Write mode 530 by way of the file handle offset, the file is opened from the BOF 520.
  • an append mode 535 by way of the file 5 handle offset the file is opened from the EOF 525.
  • the ROOT contains specific file information, such as file size, file name, and other file information.
  • the FAT comprises of cluster information which is generally a unique pointer pointing to the start of each individual file.
  • legacy file systems are restricted to primitive OS languages to act upon the data, i.e., to store, retrieve, delete and modify the data.
  • OS languages i.e., to store, retrieve, delete and modify the data.
  • Figs. 3b and 14 shows how an embodiment of a VFS is organized and stores data.
  • the VFS embodiment shown in Fig. 3b comprises a data matrix 1462 (labled a database
  • Each tablespace 345 comprises of Extents 350, which comprise blocks 355.
  • Each block 355 comprises of one or more tuples 360, and each tuple comprises of one or more cells.
  • the entire file system is stored as one entire data matrix 1462 spread over multiple clusters.
  • the structures used to implement the data matrix 1462 include a B-tree, a two-dimensional
  • the primary key (unique hash) for file data that is to be stored in cell of the data matrix 1462 can be interpreted as a combination of the path of the parent container (which would be files beginning with "J" in the example provided above) and the name of the file object itself (which would be "Mr. John” in the example provided above).
  • the unique hash provides a unique identification number that can be
  • the VFS is implemented on any system comprising of a processor 1210, such as a microprocessor, long term memory, such as a disk 1230, and a short term memory, such as a Random Access Memory ("RAM") 1220.
  • the system may include, for example, a computer, a client or server within a client-server environment, a
  • the VFS is stored on the long term memory 1230 and the components of the VFS (described below) are transferred to the RAM 1220 as and when required.
  • the components of the VFS instruct the processor 1210 how and when to act upon the data.
  • the data is stored on the long term memory 1230 and is swapped between the long 1230 and short term 1220 memories when required.
  • I5 Fig. 6 and 7 shows a block diagram and the corresponding flow representation of one embodiment of a VFS of the invention.
  • the Network Agent 675 interfaces with other systems in a client server environment. It has the functionality to carry user command packets till the kernel message pipe across various protocol layers.
  • the Command Analyzer 605 When a command is received at the VFS, the Command Analyzer 605 is responsible for analyzing the command to be performed on the VFS.
  • the operating system supports applications and associates each application with data generated by the application in a file.
  • the Command Analyzer 605 associates and executes the application based on the header information. Since the VFS has operating system files persisted in a database compliant format, particularly an ODBC / OLEDB compliant format, the commands require translation to SQL to access and manipulate the data.
  • the Command Analyzer 605 includes a language translation table (described in greater detail at the "Language Translation Table Description" section) and translates the OQL to SQL .
  • the Command Analyzer 605 sends the command to the Option Validator 625.
  • the Option Validator 625 is responsible for verification of the arguments associated with the command.
  • the Operation Analyzer 630 receives the command from the Option Validator 625 and checks the commands for syntactic errors. It estimates the various resource requirements that would be required by the command for execution. A resource that is currently in a state of exclusive use may be required for current command execution and is monitored by the Operation Analyzer 630 for availability.
  • the Operation Analyzer 630 may differ or stall the execution of current command based on user privileges or command execution hierarchy.
  • the ODBC or OLEDB driver 610 illustrated is responsible for connecting the applications to the VFS.
  • the Scripting and Parsing Engine 615 allows the VFS to communicate with legacy applications and provides interoperability across different legacy applications.
  • legacy applications communicate with the VFS by way of their proprietary commands.
  • the Scripting and Parsing Engine 615 also provides applications access to data stored on the VFS without kernel level API usage. Commands from legacy files systems, such as New Technology File System (NTFS) from Microsoft or High Performance File System (HPFS) from IBM, are translated to a table language by the File System Translator module.
  • NTFS New Technology File System
  • HPFS High Performance File System
  • the Audit or Security Manager 635 checks whether the user accessing the database has been approved with privileges to access or modify the file data. Before the command is executed, the Resource Manager 640 allocates and de-allocates resources required by the command to ensure smooth functioning.
  • the File Translator 650 is responsible for creating the unique hash and storing the data referred to in the command to a particular cell within the data matrix 1462. Because the VFS stores data in a table structure format that is database compliant, data received from legacy applications is translated into the table structure and stored on the VFS in the table structure format. The File Translator 650 performs these functionalities. Once the data has been translated, the Disk Agent 665 is responsible for storing the data in the data matrix 1462 on the media.
  • the Disk Agent 665 manages these operations in conjunction with the Cache Manager 655.
  • the Cache Manager 655 preserves and maintains clean buffers to store temporary data, i.e., data between translations and between the long and short term memories as per the instruction of the Execution Engine 645.
  • the Disk Agent 665 also manages the swapping between the resources, such as between long and short term memories when the Resource Manager runs out of short term memory and requests a swap.
  • the Disk Agent 665 creates and initializes the long-term memory and creates metadata information to manage the VFS.
  • the Execution Engine 645 executes the SQL query.
  • the translated command is received by the Execution Engine and executed to act on the data.
  • the Execution Engine 645 also instructs the Cache Manager to store or release data based on the need of the query.
  • the Error Handler 670 triggers an error when there execution of the query fails.
  • VFS components may also be stored on a portable media, such as optical and magnetic discs and memory chips, in other embodiments.
  • the portable media may be used to transfer the VFS to a system.
  • Fig. 13 describes a flow chart representing embodiments of the invention.
  • a legacy operating system file system can either be replaced by the VFS; in another embodiment, the VFS can be installed on a legacy operating system file system with capabilities to override the legacy file systems functionalities 1310 so that the legacy system with the VFS of the invention stored thereon stores data received from applications in table structure format.
  • legacy applications have not been reprogrammed using the VFS APIs available, and therefore cannot make use of data definition tables.
  • applications such as word processors, spreadsheets, drawing and paint applications send data to the VFS file system to be stored on the long-term 1230 memory at step 1320.
  • the VFS file system translates the data (step 1330) in table structure format.
  • the table structure format includes tables having records.
  • the primary key (unique hash) 1456, 1458, 1460 for the table is created using a combination of the path of the parent directory and the name of the file object itself.
  • each file is comprised of extents.
  • the data matrix 1462 comprises headers 545, which include data definition information regarding the version of the data matrix 1462, how the VFS stores information 545 and other metadata information.
  • the extents further comprise a block header 545, tuple index and tuples.
  • the tuples contain the actual file data in the form of LOB data 560.
  • the tuples are assigned unique ROWIDs, and each row comprises of one or more cells.
  • the media comprises the data matrix 1462, which behaves like a single database file. The physical persistence of this data is in a table record, and the actual file data is saved as LOB data.
  • LOB data is the actual file data, which legacy systems save in a series of linked clusters and which VFSs of the invention save within cells of the data matrix 1462.
  • the data is stored 1340 in the cell associated with the unique hash.
  • the hash 1446, 1450 is obtained from the lookup table 1454.
  • the data is obtained from the cell associated with the unique hash and sent to the application from the VFS. Because a unique hash 1446, 1450 in the look up table shows which cells data desired by a user is stored in, improved file data retrieval results. Similar interpretation of any database object is saved in Free Extent Table (FET) showing free space available and Used Extent Table (UET) tables showing space already allocated.
  • FET Free Extent Table
  • UDT Used Extent Table
  • Database applications create data in tables having records.
  • Legacy operating systems translate data created by the database applications into a file format and store the file data in the file format. This involves an unnecessary level of translation. For example, reading, writing or seeking a file in legacy file system is reconfigured and translated by the legacy system into device driver calls, which goes to Basic Input Output Service (BIOS) and finally to the physical sector with a read/write/seek.
  • Database applications send SQL or other database specific queries to the OS, and the queries are converted to either a read or a write operation. The operating system understands the read or write operation to either access or retrieve data as per the command.
  • SQL is a functionally rich language providing SQL users with substantial flexibility to access and modify data. Legacy operating system file systems failed to provide such flexibility, and users were restricted to using primitive commands such as read, write, open, close to act on the data. . Only after the file data is retrieved from legacy file system was the SQL functionality realized.
  • Legacy operating systems do not allow direct language support. Instead, legacy operating systems only allow certain batch commands or shell scripts to be executed, which are normally passive. These scripts are preprogrammed and need to be constantly running in order to be executed. The scripts or commands are interpreted by the command kernel and executed when needed.
  • the present invention stores file data 1340 in a table structure format that is database compliant and, in one embodiment, ODBC compliant, there is no need to translate file data (which is in the form of tables and records) received from a database application when storing the database file data or to translate database queries from a database language to a table language.
  • the advantage in this case is the elimination of the double translation that would be required between the database and the operating system and back since the operating system itself stores data in a database compliant format, particularly an ODBC compliant format.
  • all applications can avail themselves to the benefits of table languages such as SQL by way of translating commands in other languages to table languages such as SQL. (Translation of commands is described in greater detail below.)
  • VFS in addition to using a VFS, applications are programmed using APIs provided by VFS.
  • the VFS APIs allow applications to write data definitions in a global data definition table 1402.
  • the data definition tables serve as metadata for the data stored in the records and tables.
  • the data definition table 1410 corresponding to Microsoft Word can be broken down into data definiton substructures 1412, 1416, 1420, 1424, 1428,1432 (DDSS) such as “Document”, “Page”, “Paragraph”, “line”, “Word”, “Character,” as shown in Fig. 14.
  • DDSS data definiton substructures 1412, 1416, 1420, 1424, 1428,1432
  • each corresponding data definition table includes DDSS 1412, 1416, 1420, 1424, 1428,1432 that are based on the functionality of the application.
  • Microsoft Word data definition table 1410 type including the DDSSs 1412, 1416, 1420, 1424, 1428,1432 provided above.
  • the data definition table types 1410 for the applications running on a system are combined in a global data definition table 1402.
  • the sub-structure information for individual files is stored as individual file specific sub-structure (FSSS) 1414, 1418, 1422, 1426, 1430, 1434 within each of the DDSSs 1412, 1416, 1420, 1424, 1428,1432.
  • FSSS file specific sub-structure
  • each cell includes pointers to the actual file data, which is stored on the media in table structure format using records and tables.
  • Microsoft Excel has "Document”, “Page”, “Row” and “Column” 1438, 1442 DDSSs in addition to other DDSSs, such as “Row,” and “Column” 1446, 1450 to facilitate Excel functionality.
  • the DDSSs allows data exchange between different applications.
  • Microsoft Excel can directly use Microsoft Word data, such as information associated with the "Document” and “Page” DDSSs, without the use of middleware applications or clipboards.
  • database definition table types may be configured for several different applications and objects, such as, for example, spread sheets, drawing applications, web sites and numerous others. Another example showing how data is stored is the procedure for storing an image file, such as a "Bitmap File".
  • the Bitmap File can either be stored as Large Object (“LOB”) data or can be broken into rows of pixels or columns of pixels.
  • LOB Large Object
  • the data definition table type for Bitmap images can include information regarding attributes associated with the image such as color information, size information, and the like.
  • the LOB data is stored in a single cell of the data matrix 1462.
  • the image is broken down into a matrix of pictures and stored on the media as a row of pixels, and the DDSSs can contain information regarding each pixel row of the image.
  • a Microsoft Word document can store individual information regarding each substructure in file specific substructures within the sub-structures.
  • the paragraph structure 1422 can contain information regarding the number of lines in the paragraph, the indentations to be applied to the paragraph etc. of each document. Therefore, while exchanging only parts of the data, the formatting information of that paragraph can be retained. This is possible as the information regarding the data is stored in the data definition tables 1410. Also the application itself can access only parts of its own stored data.
  • the archival of file data content into a table structure format i.e., into tables and records, and preferably into an ODBC compliant RDBMS format with data definition tables, allows for data exchange across applications independent of operating system file formats.
  • This data exchange is possible when applications are modified and built using VFS APIs.
  • the VFS APIs allow programmers to use data definitions. These data definitions are stored in a global data definition table 1402. To exchange data between applications, one application reads the data definitions and using the data definitions, understands and retrieves data.
  • Data exchange is useful in business process automation because data integration, data sharing and presentation formats are more structured in an RDBMS than they are across custom built applications and are also more structured than are tools for performing specific tasks. Such tools incorporate a specific proprietary format. For example, manufacturing and engineering units have a lot of process control instrumentation applications, which dictate the production cycle. Commercial units of the business, however, do not need the technical data generated in the production cycle. Commercial units only require Management Information System (MIS) feedback. Translating the technical data to management specific information is normally a very tedious process. The data exchange across application specific tools requires middleware, and the reengineering process, which implicates understanding the source and target formats and the business process requirements, is cumbersome.
  • MIS Management Information System
  • the VPI APIs also provides table language functionality to the applications so that the applications send commands in a table language format.
  • table language functionality to the applications so that the applications send commands in a table language format.
  • translation of a language to a table language is not necessary because the commands are already in a table language.
  • the translation is obviated.
  • commands and queries can be accepted, and the data associated with the query can be stored in table structures, and preferably in an ODBC compliant format.
  • Users and programmers may use rich languages, such as SQL, to access and manipulate the data without undergoing multiple levels of translation.
  • legacy file systems utilized caching utilities, such as SmartDrive, Norton Change Directory, and FindFast.
  • the caching utilities storee data in the cache for predetermined time periods or for a period of time that is based on the amount of data usage. This eliminates the need to fetch data from the long term 1230 memory every time.
  • the physical file data is indexed and persisted into a database by the operating system. As a result, the dependency of following the operating system specific file system and multiple levels of translation is avoided.
  • commands received in one language are translated to a table language, such as SQL, if the language the command is received in is not a table language.
  • FIG. 15 illustrates a flow chart for translating a command.
  • the commands received in an OS language, such as OQL are received and translated to a table language, such as SQL.
  • a table language such as SQL.
  • An example of the translation is described below.
  • the translation process is described in the context of an application client connecting and retrieving data from the VFS. Those of skill in the art will appreciate, however, that in other embodiments the translation process may also take place in stand alone environments.
  • Fig. 7 is a flow chart of an embodiment of the invention in a context. The flow particularly depicts an embodiment of the current invention, wherein an application client connects and retrieves data from the VFS.
  • the client is a legacy application such as a File Transfer Protocol ("FTP") client that is not configured to fire SQL queries.
  • FTP File Transfer Protocol
  • the VFS continuously monitors for client requests. Once a client request for a connection is received, a connection is setup and commands sent by the client are analyzed using the Command Analyzer 605.
  • the step of analyzing the command 702 comprises of identifying and segregating each command into either a File Command 704 or a Directory command 708.
  • the command analyzer 605 identifies the application associated with a particular file based on information stored in the header or data definition tables. For example, "Text files" are associated with Notepad in Microsoft Windows and "BMP" files are associated with Microsoft Paint as a default painting tool.
  • the Command Analyzer 605, in conjunction with scripting interfaces, can be customized to execute various file system commands and also provides for operating system file system interoperability. For example, the "DIR" command in DOS with various argument options have equivalent functions like "Is -I /a/d” in Linux or UNIX.
  • the Operation Analyzer 630 carries out further processing.
  • the Operation Analyzer 630 validates the correctness of the command from a syntactic standpoint. Since specific commands may require a lot of processing power and utilize additional time and system resources, the Operation Analyzer 630 may buffer or stall the execution of that command based on user privileges or command execution hierarchy. For example, a directory search operation to list all the directories may require traversing the file system and various partitions, which will be overridden by a shutdown command.
  • the Option Validator 625 verifies the arguments specified in the command when commands are given in conjunction with other file system commands. It also verifies the arguments of the command when the result of one command serves as an argument for another command to be executed.
  • the Option Validator 625 is also responsible for estimating the resource requirements for the command to be executed and maintains a track of current resource availability.
  • the commands are identified as a file command, step 704, or a directory command, step 708. If any part of the process fails to be executed, at step 710, the Error Handler generates an appropriate error message.
  • the Command Analyzer translates the language that the command is received into a table language.
  • the command is translated from OQL to SQL.
  • the translation is necessary when an application sends a command to the VFS in a language other than a table language that acts on a database.
  • an FTP client may request a "RENAME" command (which is an OQL command) to rename a file or folder on the VFS.
  • the OQL command will be translated to a corresponding SQL command such as "UPDATE" with the appropriate arguments and be executed on the VFS.
  • RENAME which is an OQL command
  • the method of translation includes the Command Analyzer 605 analyzing whether the command requested is a DDL operation such as "Create” or “Alter” or “Drop” at step 714, or a DML operation such as “select” or “ Insert” or “Update” or “Delete” at step 728. While performing DDL operations at step 714, the Audit Manager 635 checks for user privileges at step 716. In the event it is a DML operation, step 728, the Command Analyzer 605 checks for any application associated with the command, step 730. (The Command Analyzer utilizes the language translation table, described below when translating commands.) The Option Validator 625 checks for existence of the application, step 732, the Execution Engine executes the command on the application with appropriate parameters, step 734.
  • commands received by the VFS in any language may be translated into other table languages that are capable of acting on data stored in table structure format.
  • Such languages are typically functionally rich and provide greater flexibility in the manner which data stored on the system may be acted upon, i.e., stored, modified, retrieved or deleted.
  • the Audit or Security Manager 635 ensures whether the user requesting the command has the necessary rights to execute the command. In the event the user does not have appropriate privileges, the Error Handler 710 triggers an error at step 710. If the user has been granted access to execute commands on the VFS, the Option Validator 625 proceeds to check the arguments specified in the command at step 720. If the Option Validator finds invalid options in the command, the Error Handler is triggered to generate an appropriate error at step 710. In the event the Option Validator 625 finds valid arguments within the command, the
  • File Translator 650 proceeds to execute the arguments, which may include storing data identified in the argument, step 724.
  • the File Translator 650 stores the actual command argument object to a particular cell(s) or ROWID (a unique identifier for each ROW) in the data matrix 1462 on the media. Because the data is stored as a table structure, the data need not be translated for storage into the data matrix 1462 when provided the data is provided in a table structure format.
  • the data is stored in an appropriate cell based on a unique hash that is generated. When applications are written using the VFS APIs they can specify data definition information in a data definition table.
  • the data is stored at an appropriate location in the data matrix 1462 based on a unique hash.
  • the File Translator also works backwards in translating data retrieved from the media to application specific data formats as per the command.
  • Other modules of the VFS which need data of any file not present in the cache, interact with the Disk Agent 665 for persistence to or retrieval of data from the VFS.
  • the Disk Agent 665 also manages secondary storage when primary storage, such as RAM, is exhausted, i.e., when the Resource Manager runs out of RAM and demands swap to store the data onto the media.
  • primary storage such as RAM
  • the Disk Agent 665 creates and initializes the long-term memory and creates metadata information to manage the VFS.
  • the Execution Engine 645 is equipped with the logic of executing operating system related functionality received from operating system related commands. It translates the user commands (with or without arguments) and performs the execution.
  • the command that maybe an internally fired SQL query, works likewise to a database query execution engine.
  • the execution may result in some storage of data, which may be part of VFS data definition table. For example, a command like "CD" (change directory) is intended to change the current directory whereas a "MD" (MKDIR) command with arguments will result in the actual creation of a directory.
  • the command for the creation of a directory is analogous to an "INSERT" statement in SQL. This command is interpreted by the VFS and checked for certain constraint validations. For example, checking for a directory with the same name in the same parent directory would not be permitted and generate an error.
  • the Execution Engine 645 is responsible for execution of the SQL queries that have been generated.
  • VFS provides support for VFS communicating with a legacy file system and applications or a VFS communicating with another VFS file system and applications built using VFS APIs. This is done by parsing and analyzing commands received from applications and translating them to queries that VFS can understand. Interoperability between VFS and legacy file systems is provided for by a file system translator 620. In one embodiment, interoperabilty is provided for numerous file systems including the New Technology File System (NTFS) from Microsoft and the High Performance File System (HPFS) from IBM.
  • NTFS New Technology File System
  • HPFS High Performance File System
  • a Network Agent module 675 For interfacing other machines or other file system to the VFS system, a Network Agent module 675 is supported, which exhibits functionality to carry user commands through the network stack across various protocol layers.
  • Fig. 8 is a diagram and embodiment of a language translation table of the current invention depicting the interpretation of OQL commands as SQL queries and vice-versa.
  • Legacy file system 800 which is in communication with the VFS.
  • the legacy file system uses primitive file and directory commands which need to be interpreted for accessing data on the VFS. This interpretation involves the use of translating an OS language, such as an Object Query Language (OQL), to a table language, such as a Structured Query Language (SQL) 805.
  • OS language such as an Object Query Language (OQL)
  • table language such as a Structured Query Language (SQL) 805.
  • SQL Structured Query Language
  • the "DIR" command 810 with various options such as 7OS/OD/OA/ad” is interpreted as a "SELECT" command 815 specifying arguments for accessing a specific row or column.
  • the file "COPY” command 820 with a source and target argument is interpreted after the OQL to SQL translation to either an "INSERT” or "UPDATE” statement 825.
  • "DEL" commands for deleting a file 830 with the path of the filename as an argument is interpreted as a "DELETE” 835 operation in SQL with the specific ROWID or COLUMNID as an argument.
  • RENAME command 840 translates to an "UPDATE" statement 845.
  • the "SEARCH” or "FIND” command 850 translates to "SELECT" statement 855.
  • a command for making a directory in a file system is usually executed as "MKDIR” or "MD” with the directory name as an argument 865 creates a directory entry in the root with respect to its current parent path.
  • an ""INSERT" query can create the directory entry but the interpretation of the command ("MKDIR” / "MD") into an "INSERT” query has to be fortified by a logical interpretation of the file attributes, which separates the file from a directory in a standard file system. Also failures in any such operations need to be tracked. For example if a directory or file with the same name exists in the same parent path, an error message is triggered as per normal operating system file system logic.
  • Fig. 9 shows another embodiment of a language translation table wherein a VFS is in communication with a legacy file system.
  • a basic database query such as "Connect” 900, used with a usemame and password as arguments, grants access to all tables and records.
  • Connect 900
  • Disk Agent 665 is responsible for handling other file operations or commands such as "Seek” 925, "Tell” 930, “Read” 935, “Write” 940 etc.
  • the Disk Agent 665 creates and initializes the long-term memory and creates metadata information to manage the VFS.
  • Fig 10 illustrates another language translation table of other legacy file system commands into a language, such as SQL, queries on the VFS.
  • a file command 1000 such as "Copy” 1005 is translated to an "INSERT” 1010 query on the VFS.
  • the file operation “Del” 1015 is translated to "DELETE” statement 1020 on the VFS.
  • the file operation "REN” 1025 used for renaming is translated to an "UPDATE” query 1030.
  • the file operation "MOVE" 1035 is translated to an "UPDATE” statement 1040.
  • the Directory operations 1085 such as "MD” with the directory name as arguments 1045 to make directory is translated to an "INSERT" statement 1050 on the VFS.
  • the directory operation such as "CD ⁇ directory name>” 1055 to change directory is translated to a "SELECT” query 1060 on the VFS.
  • the "RD” 1065 with the directory name as an argument to remove directory is translated to "DELETE" statement 1070 on the VFS.
  • the operations on the files are supported by the operating system.
  • the functionality varies in terms of security and monitoring shared resources.
  • the shared resources generally are directly translated by the file system, that is data shared is always accessed through the file system and basic file input or output commands operate in shared mode for a server based file system.
  • Commands such as "Seek” 1150 and “fseek” 1155 can manipulate the file handle, the identifier of a file, with respect to the block origin. Commands like “tell” 1160 and “ftell” 1165 can return the file offset with respect to block offset within the file system database.
  • the translation of the actual file data can be managed into database tuples as any normal database record. The number of files, which the user has access is exposed through a database view which prevents illegal access of other records that is files) in the table.
  • Any operating system typically has file classification associated with a predefined functionality or associated application that is certain files are Binary executable such as “.exe” files, batch executable such as “.bat” files or “.bmp” linked with a painting application like paintbrush. Whenever user tries to execute these files the VFS spawns / forks a process which performs this execution irrespective of operating system on which the VFS is implemented.
  • the associations of the file functionality or any associated application are saved in the metadata of the VFS database.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

The invention generally pertains to a file system that stores data received from applications in a table structure format. In one embodiment, applications use APIs so that the applications send to the file system data that includes data definition tables. In another embodiment, commands received by file systems of the invention are translated to a table language that can act on data stored in the table storage format.

Description

TITLE OF THE INVENTION
A NOVEL FILESYSTEM
FIELD OF INVENTION
The present invention relates to computer processing systems and to a file system for storing and retrieving information.
BACKGROUND OF THE INVENTION
All software applications need to store, retrieve and modify information. Every system has a file system that controls how files are named and stored on physical media to enable easy storage and retrieval. A file system from a macro view is a collection of files and directories and the operations available to be performed on them. Since the origin of long-term storage media much effort has been made to create file systems that are easily accessible and that facilitate manipulation of files.
Applications send data in the form of a byte stream to an operating system. This byte stream contains information associated with the file information being sent by the applications. The information may comprise of the filename, the size of file, the extension, the pattern of file usage with respect to the operating system functionality, the identity of applications associated with the file, and the like. However, at the physical level the operating system divides the byte stream received into chunks of data that are stored on media. The media is divided into sectors and clusters, with a number of sectors comprising a cluster. The byte stream is stored in clusters. This entire process is carried out by the operating system, which stores these byte streams into entities termed as files. The byte stream may either contain data regarding a new file that is to be created and stored on the media or may contain information regarding an already existing file that needs to be updated, deleted or modified.
Fig. 1 illustrates a block diagram of the physical interpretation of a hard disk drive (HDD) on which the physical file system is stored. The HDD comprises of platters 100,105,110 mounted over a central spindle 115. The Disk heads 120,125,130,135,140,145 that are positioned on both sides of each platter are capable of being repositioned for reading and writing. At the physical level, the OS divides the storage media such as HDD, FDD etc into concentric tracks 150,155,160. These tracks 150,155,160 are divided to form sectors 165. A collection of such sectors 165 from various consecutive platters forms the cylinder 170. A collection of clusters is called a file. The physical sector grouping is OS specific which means that the number of sectors in a cluster is different on different operating systems.
Fig. 2 is a block diagram showing how a legacy file system organizes ad stores files and file related data. Each operating system follows its own proprietary protocol of managing byte streams of data in an effort to attain optimal performance and resource utilization. Each OS vendor utilized their own proprietary file system that is optimized based on OS specific requirements. For example, Raw HDD 200 illustrated in Fig. 2, is partitioned into <C:> 210 and <D:> 215 using a command such as 'fdisk' 205. The C: partition 210 can then be formatted using a 'FORMAT' command. The formatting procedure creates a ROOT 220, FAT 225 and user space 230 on the formatted partition. The ROOT partition comprises of 220 file details for example, the size of the file, the file name, user privilege attributes and the like. The File Allocation Table (FAT) is a table that contains information showing where the file is stored. It provides file specific pointers to the sector on the hard disk where the corresponding file begins. The HDD 235 also includes user space 230 for collection and storage of actual data. Such user space can either be used space 240, i.e., space that has already been allocated to files, or free space 245, i.e., space that has yet to be allocated to files.
When a user accesses a file, the operating system references information available in the ROOT and FAT and then traverses through the clusters to retrieve file data contained in the user space. To facilitate storage of files by users into user defined groups, the files can be stored into a logical entities called directories. The directories act as containers for the files within them. The directory file structure is analogous to a parent child relation and this logical relation, with respect to physical media storage comprises of what is popularly known as a file system.
Several file systems have been developed. For example, a variety of 12-bit, 16-bit and 32-bit FAT file systems and numerous other file systems such as FAT32 implemented in Windows 98 from Microsoft Corporation, HPFS implemented in OS/2 from IBM, NTFS implemented in Windows NT from Microsoft Corporation have been developed. The FAT32 file system is the first 32-bit file system and was implemented by
Microsoft in the Windows 98 operating system. In addition to integrated 32-bit file and disk access, the structure on the disk was basically the same as a previous file system called DOS v4. Most of the Windows 98 OS properties were similar to the DOS v4 OS to facilitate backward compatibility. Windows 98 was a single user, multitasking system. It lacked multi- user support and an important feature of the Windows operating systems called 'scandisk'. 'Scandisk' was not included on Windows 98 because 'scandisk' was a 16-bit program which could not process volumes using the FAT32 file system that has a FAT larger than 6 MB or less than 64 KB in size.
Another file system called the HPFS file system was developed for the OS/2 operating system provided by International Business Machines ('IBM'). This file system addressed several shortcomings of previous file systems and included advantages such as optimized file system performance and the ability to recover from more severe faults than other file system existing at set time. However, the HPFS implementation required high processing power, high memory and large disk space, making it difficult to use and execute applications on the OS/2. Further, the HPFS file system lacked good security and efficient search functionalities of the files stored on the media.
The next generation of file system by Microsoft sought to address these shortcoming. Microsoft launched Windows NT OS, which included the New Technology File System (NTFS). NTFS enabled a multi-user and multitasking environment. Although the NTFS addressed some of the issues arising from multi user environments, long files names and the like, the NTFS file system failed to sort every file and directory or index the files and directories to enable quick retrieval. This made searching for files in NTFS file systems slow.
Another drawback with current file systems is that they only provide users with primitive commands. For example, legacy OSs only include primitive commands such as read, write, flush, open and close type commands for file access and manipulation. These commands do not provide the user with much flexibility to manipulate the data stored on the media.
Another problem not adequately addressed by NTFS includes issues arising from fragmentation of files. Since all operating systems use clusters to store files, and cluster size is predetermined, OS's often use more space than is required by a file to store the file on media. This leads to fragmentation of the FAT disk and further reduces file access speed in addition to unnecessary memory wastage. Although this was addressed in OS/2 and Windows NT file systems by searching for the maximum amount of continuous unused space that would be sufficiently large enough to accommodate the file before recording it, the problem was not completely addressed.
All applications written for a particular operating system are written using Application Program Interfaces ("API") that prescribe procedures the applications use to access the operating system. API's contains procedures for the creation of graphical user interface of the application, ways that the application can access I/O devices and the like.
Furthermore, operating systems do not have exposed APIs that can be used to store data directly on the file system. This causes application programmers to write applications which store data in their own propriety formats without any definitions regarding the data. For example, Microsoft Word sends its data to the operating system in a different format than does Corel Draw or database applications. The proprietary format of data is translated into file format, i.e., received by the operating system and encapsulated by the operating system into a file. The file is either stored in contiguous sectors on the media or fragmented and stored in parts, based on the amount of contiguous space available. Therefore, even though the data is stored on media, the format of the data stored is proprietary to the applications that create the data.
Present day operating systems do not allow applications to store data definitions, i.e., definitions regarding the structure of the data being stored. Applications are also unable to store data definitions, and hence one application cannot directly access data created by another application. Microsoft facilitates multi-application data sharing by using a clipboard, which serves as a memory conduit to copy or move data across applications. However, often times while copying data across applications, the formatting of the data is lost due to lack of data definitions. A database is important to every business. It is a powerful tool for creating and managing large amounts of data efficiently and for storing data over long periods of time. Further the database has several advantages such as the long-term storage, meaning that data exists independently of any process that utilizes the data. Another advantage of a database is the flexibility to manipulate stored data in much more complex ways than the primitive file reading and writing functionality of legacy OSs.
For example, databases use some of the most sophisticated technologies to store, access and modify data. Present day databases use tables and records to store data. There are several ways that databases store data in tables and records. For example a unique hash can be generated for the type of data being received and likewise the data can be saved in a specific record or it can be saved in contiguous sectors in the table itself. These procedures facilitate easy data retrieval and access.
Another advantage available in databases is the provision of rich flexible languages such as SQL to access and manipulate data. SQL has been created specifically with the intention to access and manipulate large amounts of data. Instead of using the primitive commands that have been provided by the operating system, database users can now use a functionally rich language such as SQL to access and manipulate their data. One drawback experienced by databases is the translation between the database and the operating system that is always required. When a database is installed over the operating system, there is a lot of unnecessary wastage of system resources in the translation of this data from database to operating system to physical disk and further all the way back. This is due to the fact that the file system technology implemented by an operating system is different than the one used by a database. Since the commands to read and write data, like "fread" and "fwrite" are handled by the operating system, the database that fires an SQL query needs to translate the SQL query into a read or write command, which in turn fetches the data from the file system. The intended functionality of the query can only be performed once the data has been fetched by the operating system. This causes a level of translation and often times multiple levels of translation, which substantially slows down the access rate of the files.
Present day file systems, however, do not efficiently utilize this database functionality. Since databases are also installed on operating systems, the tables and records created by the database are translated to a file format and are stored as a single file in legacy systems. However, The files contain the tables as described above which provides for greater flexibility for accessing data.
There remains a need to address the problems associated with accessing and manipulating data on an operating system. BRIEF DESCRIPTION OF THE DRAWINGS
The various objects and advantages of the present invention will become apparent to those of ordinary skill in the relevant art after reviewing the following detailed description and accompanying drawings, wherein:
Fig. 1 is a block diagram of the physical interpretation of the Hard Disk Drive (HDD), called the physical file system.
Fig. 2 is a block diagram depicting the logical interpretation of the HDD, called the logical file system.
Figs. 3a, 4a and 5a show how legacy file systems 300 are organized and store data.
Figs. 3b, 4b, and 5b show how an embodiment of a VFS is organized and stores data
Fig. 6 and 7 shows a block diagram and the corresponding flow representation of one embodiment of a VFS of the invention.
Fig. 8 is a diagram depicting the usage of command interpretation to SQL that is the operations that can be performed on a file system can be translated to database.
Fig. 9 is a diagram depicting the functionality of the "Connect" and "DISCONNECT"
Fig. 10 and 11 depicts the various File and Directory commands translated to corresponding database query statements.
Fig. 12 illustrates an embodiment of the components required for the working of the VFS.
Fig. 13 describes a flow chart representing embodiments of the invention Fig. 14 illustrates an embodiment of the data definition tables. Fig. 15 illustrates a flow chart for translating a command
DETAILED DESCRIPTION OF THE INVENTION
The present invention may be embodied in several forms, structures and manners.
The description provided below and the drawings show an exemplary embodiment of the invention translating commands in one language to SQL. Those of skill in the art will appreciate that the invention may be embodied in other forms, structures and manners not shown belqw, including translating commands to any table language. The invention shall have the full scope of the claims and is not to be limited by the embodiments shown below. In the present disclosure, the words "a" or "an" are to be taken to include both the singular and the plural. Conversely, any reference to plural items shall, where appropriate, include the singular.
The invention pertains to a file system in an operating system and, in particular, pertains to storing and utilizing data in a table structure format that is database compliant. Embodiments of the present invention utilize the searching, storage and other database benefits in the context of an operating system. Such a database formulated file system is referred to as a Virtual File System ("VFS").
In a first embodiment, the VFS is used in an OS so that the OS file system stores file data in a table structure format, which is a database compliant one and, in an alternative embodiment, an Open Database Connectivity (ODBC) compliant one. The VFS facilitates the use of a table language, i.e., a language that can act on data stored in a table structure format. Table languages, such as SQL, are generally functionally rich languages and provide improved data access and manipulation functionalities and performance. One reason that table languages provide for improved functionality and performance is that the table languages act on data stored in table structure format, which typically include tables including records and incorporate hashing functions.
In the first embodiment, when new data is stored to the VFS, the VFS creates a unique hash, which is linked to a particular cell of the data matrix stored on the media. Data access is, therefore, improved because the number of records to be traversed when locating a file is reduced. For example, to access the record of Mr. John, files beginning with "J" and then "Jo" are searched based on the unique hash, but files beginning with other letters such as "S" are not. Thus, the need to traverse through Mr. Smith's data is eliminated; and, to add another field in Mr. Smith's record, the need to traverse through other fields associated with Mr. Smith is eliminated. This is done using the hash that can be used to directly access the cell required in the data matrix.
In previous systems, the operating system stored application files as separate files without hashing function incorporation and, therefore, required traversing through unnecessary files on the media to access or search for a specific application file. Even though database applications naturally create data in table and record format and data that incorporates hashing functions, legacy file systems store such data in file formats, which can only be acted on by file system languages, such as OQL. And even though database applications naturally send commands to an OS in table languages (such as SQL), the commands must be translated to an OS compliant language to act upon the data, which is stored in a file format. Because the OS languages do not have the flexibility of the table languages, the potential to utilize table language flexibility on table and record formatted data that incorporates hashing functions is lost. Further, the additional translation from SQL to OQL substantially slows file retrieval, storage and manipulation processes.
In a second embodiment of the invention, commands received from applications to act upon the data stored in the data matrix are translated to commands in a table language. As a result, the improved flexibility and performance offered by table languages are realized even for legacy applications that send commands in file system languages, such as OQL.
A third embodiment of the invention involves reprogramming applications using APIs provided by the VFS. The APIs allow users to specify data definitions for data to be stored. These data definitions include, for example, the characteristics that the data has been associated with. For example, the data definition table may include data definition sub- structures (DDSS) such as "Employee Name" and "Employee ID," and the actual file data can be the names of the employees such as "Mr. Smith", "Mr. John" and employee identifications such as "1234" or "5678". Applications utilizing the same DDSSs can share data. For example, if the "Employee Name" DDSS is created by a first application, a second application also using the "Employee Name" DDSS can use the "Employee Name" data (for example, "Mr. Smith" and "Mr. John"). Thus, pursuant to the third embodiment, an application directly uses data created by a different application without the use of middleware or clipboards. Improved data sharing is achieved. This enables the application to retrieve data from the media as well as formatting and other like information stored by the other application in the data definition tables.
In a fourth embodiment, the applications may be reproggamed to send commands to the file system in a table language, thus, in variations of the fourth embodiment, the benefits of table languages may be realized without translation. Figs. 3a, 4a and 5a shows how legacy file systems 300 are organized and store data. In the legacy OS file system 300 of Fig. 3a, the hierarchical structure comprises of a Disk 310 (or long term memory), which is typically partitioned. Each partition or sub-disk 315 contains a root 320, which contains basic file specific information such as file name, file size, user privilege information and the like. Most file systems also have a File Allocation Table ("FAT") (Fig. 4a) that contains pointers for each file that is created. The pointers point to the starting location 520 of the clusters 515 (Fig. 5a) on which the file is saved. The user space, which is the entity where the data resides, is made up of directories 325. Files may be stored in parent directories 325 or in sub-directories 330,
. which the directories are divided up into. The directories 325 and sub-directories serve as a container for holding one or more files.
Legacy file systems receive data from applications and translate the file data into a file format, as depicted in Fig. 4a. File information is stored in the root directory entry 405. The Master Table or ROOT 410 contains values such as File ID 415, Name 420, Path 425, Size 430, Attributes 435. Further, the Transaction Table 440 comprises of File ID 445 as well as File Data 450. This File ID 445 is a foreign key that contains pointers to the start of the files stored on the media called the file data. The ROOT puts several constraints on certain fields such as proscribing files having the same name. Information about a file object (that is name, size, date time etc) is saved in the root directory entries with respect
, , to the object's parent. Hence, all files in a file system are always associated with its current logical location and its referential parent location.
In legacy systems, data is stored in a file format in a long term memory having a disk by way of a propriety file system. The disk comprises numbered sectors 505. Each sector 505 on the disk is, for example, "512 bytes." In one legacy system, four sectors comprise a cluster 515, and each cluster is two kilobytes in size 515. The Beginning of the file (BOF) 520 and End of the File (EOF) 525 are 10 kilobytes apart. When the file is opened in Read/Write mode 530 by way of the file handle offset, the file is opened from the BOF 520. Whereas, when the file is opened in an append mode 535 by way of the file 5 handle offset, the file is opened from the EOF 525. The ROOT contains specific file information, such as file size, file name, and other file information. The FAT comprises of cluster information which is generally a unique pointer pointing to the start of each individual file.
Because the file data received from the applications is translated into a file format in
LO legacy file systems, legacy file systems are restricted to primitive OS languages to act upon the data, i.e., to store, retrieve, delete and modify the data. Thus, in legacy systems, the way in which a user may act upon data is limited.
Figs. 3b and 14 shows how an embodiment of a VFS is organized and stores data. The VFS embodiment shown in Fig. 3b comprises a data matrix 1462 (labled a database
.5 340 in Fig. 3b) that is partitioned into one or more tablespaces 345. Each tablespace 345 comprises of Extents 350, which comprise blocks 355. Each block 355 comprises of one or more tuples 360, and each tuple comprises of one or more cells. In one embodiment, the entire file system is stored as one entire data matrix 1462 spread over multiple clusters. The structures used to implement the data matrix 1462 include a B-tree, a two-dimensional
!0 array and the like. In one embodiment, the primary key (unique hash) for file data that is to be stored in cell of the data matrix 1462 can be interpreted as a combination of the path of the parent container (which would be files beginning with "J" in the example provided above) and the name of the file object itself (which would be "Mr. John" in the example provided above). The unique hash provides a unique identification number that can be
!5 used to access the file data.
In the embodiment of Fig. 12, the VFS is implemented on any system comprising of a processor 1210, such as a microprocessor, long term memory, such as a disk 1230, and a short term memory, such as a Random Access Memory ("RAM") 1220. The system may include, for example, a computer, a client or server within a client-server environment, a
SO personal digital assistant, a mobile phone, and the like. The VFS is stored on the long term memory 1230 and the components of the VFS (described below) are transferred to the RAM 1220 as and when required. The components of the VFS instruct the processor 1210 how and when to act upon the data. The data is stored on the long term memory 1230 and is swapped between the long 1230 and short term 1220 memories when required.
I5 Fig. 6 and 7 shows a block diagram and the corresponding flow representation of one embodiment of a VFS of the invention. The Network Agent 675 interfaces with other systems in a client server environment. It has the functionality to carry user command packets till the kernel message pipe across various protocol layers.
When a command is received at the VFS, the Command Analyzer 605 is responsible for analyzing the command to be performed on the VFS. The operating system supports applications and associates each application with data generated by the application in a file. The Command Analyzer 605 associates and executes the application based on the header information. Since the VFS has operating system files persisted in a database compliant format, particularly an ODBC / OLEDB compliant format, the commands require translation to SQL to access and manipulate the data. The Command Analyzer 605 includes a language translation table (described in greater detail at the "Language Translation Table Description" section) and translates the OQL to SQL .
The Command Analyzer 605 sends the command to the Option Validator 625. The Option Validator 625 is responsible for verification of the arguments associated with the command. The Operation Analyzer 630 receives the command from the Option Validator 625 and checks the commands for syntactic errors. It estimates the various resource requirements that would be required by the command for execution. A resource that is currently in a state of exclusive use may be required for current command execution and is monitored by the Operation Analyzer 630 for availability. The Operation Analyzer 630 may differ or stall the execution of current command based on user privileges or command execution hierarchy. The ODBC or OLEDB driver 610 illustrated is responsible for connecting the applications to the VFS. The Scripting and Parsing Engine 615 allows the VFS to communicate with legacy applications and provides interoperability across different legacy applications. In one embodiment, legacy applications communicate with the VFS by way of their proprietary commands. The Scripting and Parsing Engine 615 also provides applications access to data stored on the VFS without kernel level API usage. Commands from legacy files systems, such as New Technology File System (NTFS) from Microsoft or High Performance File System (HPFS) from IBM, are translated to a table language by the File System Translator module.
The Audit or Security Manager 635 checks whether the user accessing the database has been approved with privileges to access or modify the file data. Before the command is executed, the Resource Manager 640 allocates and de-allocates resources required by the command to ensure smooth functioning. The File Translator 650 is responsible for creating the unique hash and storing the data referred to in the command to a particular cell within the data matrix 1462. Because the VFS stores data in a table structure format that is database compliant, data received from legacy applications is translated into the table structure and stored on the VFS in the table structure format. The File Translator 650 performs these functionalities. Once the data has been translated, the Disk Agent 665 is responsible for storing the data in the data matrix 1462 on the media. All long term memory related activities with respect to fetching of records, reading writing or updating data is managed by the Disk Agent 665. The Disk Agent 665 manages these operations in conjunction with the Cache Manager 655. The Cache Manager 655 preserves and maintains clean buffers to store temporary data, i.e., data between translations and between the long and short term memories as per the instruction of the Execution Engine 645. The Disk Agent 665 also manages the swapping between the resources, such as between long and short term memories when the Resource Manager runs out of short term memory and requests a swap. When an operating system file system is replaced by the VFS, the Disk Agent 665 creates and initializes the long-term memory and creates metadata information to manage the VFS.
The Execution Engine 645 executes the SQL query. The translated command is received by the Execution Engine and executed to act on the data. The Execution Engine 645 also instructs the Cache Manager to store or release data based on the need of the query. The Error Handler 670 triggers an error when there execution of the query fails.
Note in addition to the VFS components described above being implemented on a system, as provided in the embodiment described above, the VFS components may also be stored on a portable media, such as optical and magnetic discs and memory chips, in other embodiments. The portable media may be used to transfer the VFS to a system. Fig. 13 describes a flow chart representing embodiments of the invention. In one embodiment, a legacy operating system file system can either be replaced by the VFS; in another embodiment, the VFS can be installed on a legacy operating system file system with capabilities to override the legacy file systems functionalities 1310 so that the legacy system with the VFS of the invention stored thereon stores data received from applications in table structure format. In the first embodiment, legacy applications have not been reprogrammed using the VFS APIs available, and therefore cannot make use of data definition tables. In one scenario, applications such as word processors, spreadsheets, drawing and paint applications send data to the VFS file system to be stored on the long-term 1230 memory at step 1320. The VFS file system translates the data (step 1330) in table structure format. In one embodiment, the table structure format includes tables having records. As shown in Fig. 14, once the data is received by the VFS, it creates a unique hash, which is stored in a Lookup table 1454. One such example is when the primary key (unique hash) 1456, 1458, 1460 for the table is created using a combination of the path of the parent directory and the name of the file object itself. Those of skill in the art shall appreciate that there are numerous ways that such hashes can be created.
In the VFS 540 shown in Fig. 5, each file is comprised of extents. The data matrix 1462 comprises headers 545, which include data definition information regarding the version of the data matrix 1462, how the VFS stores information 545 and other metadata information. The extents further comprise a block header 545, tuple index and tuples. The tuples contain the actual file data in the form of LOB data 560. The tuples are assigned unique ROWIDs, and each row comprises of one or more cells. The media comprises the data matrix 1462, which behaves like a single database file. The physical persistence of this data is in a table record, and the actual file data is saved as LOB data. LOB data is the actual file data, which legacy systems save in a series of linked clusters and which VFSs of the invention save within cells of the data matrix 1462.
The data is stored 1340 in the cell associated with the unique hash. When the data needs to be accessed or modified, the hash 1446, 1450 is obtained from the lookup table 1454. The data is obtained from the cell associated with the unique hash and sent to the application from the VFS. Because a unique hash 1446, 1450 in the look up table shows which cells data desired by a user is stored in, improved file data retrieval results. Similar interpretation of any database object is saved in Free Extent Table (FET) showing free space available and Used Extent Table (UET) tables showing space already allocated.
Database applications create data in tables having records. Legacy operating systems translate data created by the database applications into a file format and store the file data in the file format. This involves an unnecessary level of translation. For example, reading, writing or seeking a file in legacy file system is reconfigured and translated by the legacy system into device driver calls, which goes to Basic Input Output Service (BIOS) and finally to the physical sector with a read/write/seek. Database applications send SQL or other database specific queries to the OS, and the queries are converted to either a read or a write operation. The operating system understands the read or write operation to either access or retrieve data as per the command. SQL is a functionally rich language providing SQL users with substantial flexibility to access and modify data. Legacy operating system file systems failed to provide such flexibility, and users were restricted to using primitive commands such as read, write, open, close to act on the data. . Only after the file data is retrieved from legacy file system was the SQL functionality realized.
Legacy operating systems, however, do not allow direct language support. Instead, legacy operating systems only allow certain batch commands or shell scripts to be executed, which are normally passive. These scripts are preprogrammed and need to be constantly running in order to be executed. The scripts or commands are interpreted by the command kernel and executed when needed.
Because the present invention stores file data 1340 in a table structure format that is database compliant and, in one embodiment, ODBC compliant, there is no need to translate file data (which is in the form of tables and records) received from a database application when storing the database file data or to translate database queries from a database language to a table language. The advantage in this case is the elimination of the double translation that would be required between the database and the operating system and back since the operating system itself stores data in a database compliant format, particularly an ODBC compliant format. Pursuant to embodiments of the invention, all applications can avail themselves to the benefits of table languages such as SQL by way of translating commands in other languages to table languages such as SQL. (Translation of commands is described in greater detail below.)
In another embodiment of the present invention, in addition to using a VFS, applications are programmed using APIs provided by VFS. The VFS APIs allow applications to write data definitions in a global data definition table 1402. The data definition tables serve as metadata for the data stored in the records and tables.
An example of this could be by functions such as "frdefine" and fwdefine". When an application programmer needs to write data into the data definition table, "fwdefine" with the certain arguments can be used. On the other hand, to read data from the data definition table, "frdefine" can be used. Those of skill in the art will appreciate that the names provided for the functions may vary, as long as the functionality required to be performed remains. To illustrate, let us consider Microsoft Word as an application being reprogrammed using VFS APIs and working on the VFS. In one embodiment, the data definition table 1410 corresponding to Microsoft Word can be broken down into data definiton substructures 1412, 1416, 1420, 1424, 1428,1432 (DDSS) such as "Document", "Page", "Paragraph", "line", "Word", "Character," as shown in Fig. 14. In one embodiment, there is a data definition table type 1410 corresponding for each application, and each corresponding data definition table includes DDSS 1412, 1416, 1420, 1424, 1428,1432 that are based on the functionality of the application. Thus, there will be a Microsoft Word data definition table 1410 type including the DDSSs 1412, 1416, 1420, 1424, 1428,1432 provided above. The data definition table types 1410 for the applications running on a system are combined in a global data definition table 1402. The sub-structure information for individual files is stored as individual file specific sub-structure (FSSS) 1414, 1418, 1422, 1426, 1430, 1434 within each of the DDSSs 1412, 1416, 1420, 1424, 1428,1432. In one embodiment, each cell includes pointers to the actual file data, which is stored on the media in table structure format using records and tables. There are several advantages of providing data definition tables. Applications that need to use data created by another application may have similar DDSSs. For example, Microsoft Excel has "Document", "Page", "Row" and "Column" 1438, 1442 DDSSs in addition to other DDSSs, such as "Row," and "Column" 1446, 1450 to facilitate Excel functionality. The DDSSs allows data exchange between different applications. For example, Microsoft Excel can directly use Microsoft Word data, such as information associated with the "Document" and "Page" DDSSs, without the use of middleware applications or clipboards. Those of skill in the art will appreciate that database definition table types may be configured for several different applications and objects, such as, for example, spread sheets, drawing applications, web sites and numerous others. Another example showing how data is stored is the procedure for storing an image file, such as a "Bitmap File". The Bitmap File can either be stored as Large Object ("LOB") data or can be broken into rows of pixels or columns of pixels. If the Bitmap is stored as LOB data, the data definition table type for Bitmap images can include information regarding attributes associated with the image such as color information, size information, and the like. In one embodiment, the LOB data is stored in a single cell of the data matrix 1462. In another embodiment, the image is broken down into a matrix of pictures and stored on the media as a row of pixels, and the DDSSs can contain information regarding each pixel row of the image.
Another advantage is the storage of intelligent information regarding the actual file data. A Microsoft Word document can store individual information regarding each substructure in file specific substructures within the sub-structures. For example, the paragraph structure 1422 can contain information regarding the number of lines in the paragraph, the indentations to be applied to the paragraph etc. of each document. Therefore, while exchanging only parts of the data, the formatting information of that paragraph can be retained. This is possible as the information regarding the data is stored in the data definition tables 1410. Also the application itself can access only parts of its own stored data.
The archival of file data content into a table structure format, i.e., into tables and records, and preferably into an ODBC compliant RDBMS format with data definition tables, allows for data exchange across applications independent of operating system file formats. This data exchange is possible when applications are modified and built using VFS APIs. The VFS APIs allow programmers to use data definitions. These data definitions are stored in a global data definition table 1402. To exchange data between applications, one application reads the data definitions and using the data definitions, understands and retrieves data.
Data exchange is useful in business process automation because data integration, data sharing and presentation formats are more structured in an RDBMS than they are across custom built applications and are also more structured than are tools for performing specific tasks. Such tools incorporate a specific proprietary format. For example, manufacturing and engineering units have a lot of process control instrumentation applications, which dictate the production cycle. Commercial units of the business, however, do not need the technical data generated in the production cycle. Commercial units only require Management Information System (MIS) feedback. Translating the technical data to management specific information is normally a very tedious process. The data exchange across application specific tools requires middleware, and the reengineering process, which implicates understanding the source and target formats and the business process requirements, is cumbersome.
In anoher embodiment of the invention, the VPI APIs also provides table language functionality to the applications so that the applications send commands in a table language format. In the fourth embodiment, translation of a language to a table language is not necessary because the commands are already in a table language.
Pursuant to embodiments of the invention, the translation is obviated. Several commands and queries can be accepted, and the data associated with the query can be stored in table structures, and preferably in an ODBC compliant format. Users and programmers may use rich languages, such as SQL, to access and manipulate the data without undergoing multiple levels of translation.
To enhance data accessing mechanisms legacy file systems utilized caching utilities, such as SmartDrive, Norton Change Directory, and FindFast. The caching utilities storee data in the cache for predetermined time periods or for a period of time that is based on the amount of data usage. This eliminates the need to fetch data from the long term 1230 memory every time. Pursuant to an embodiment of the invention, the physical file data is indexed and persisted into a database by the operating system. As a result, the dependency of following the operating system specific file system and multiple levels of translation is avoided. Pursuant to another embodiment of the invention, commands received in one language are translated to a table language, such as SQL, if the language the command is received in is not a table language. Fig. 15 illustrates a flow chart for translating a command. The commands received in an OS language, such as OQL, are received and translated to a table language, such as SQL. An example of the translation is described below. The translation process is described in the context of an application client connecting and retrieving data from the VFS. Those of skill in the art will appreciate, however, that in other embodiments the translation process may also take place in stand alone environments.
Fig. 7 is a flow chart of an embodiment of the invention in a context. The flow particularly depicts an embodiment of the current invention, wherein an application client connects and retrieves data from the VFS. The client is a legacy application such as a File Transfer Protocol ("FTP") client that is not configured to fire SQL queries. In a passive state, step 700, the VFS continuously monitors for client requests. Once a client request for a connection is received, a connection is setup and commands sent by the client are analyzed using the Command Analyzer 605. The step of analyzing the command 702 comprises of identifying and segregating each command into either a File Command 704 or a Directory command 708. The command analyzer 605 identifies the application associated with a particular file based on information stored in the header or data definition tables. For example, "Text files" are associated with Notepad in Microsoft Windows and "BMP" files are associated with Microsoft Paint as a default painting tool. The Command Analyzer 605, in conjunction with scripting interfaces, can be customized to execute various file system commands and also provides for operating system file system interoperability. For example, the "DIR" command in DOS with various argument options have equivalent functions like "Is -I /a/d" in Linux or UNIX.
If the command, step 702, has been classified and identified as a file command at step 704, the Operation Analyzer 630 carries out further processing. The Operation Analyzer 630 validates the correctness of the command from a syntactic standpoint. Since specific commands may require a lot of processing power and utilize additional time and system resources, the Operation Analyzer 630 may buffer or stall the execution of that command based on user privileges or command execution hierarchy. For example, a directory search operation to list all the directories may require traversing the file system and various partitions, which will be overridden by a shutdown command. The Option Validator 625 verifies the arguments specified in the command when commands are given in conjunction with other file system commands. It also verifies the arguments of the command when the result of one command serves as an argument for another command to be executed. The Option Validator 625 is also responsible for estimating the resource requirements for the command to be executed and maintains a track of current resource availability.
The commands are identified as a file command, step 704, or a directory command, step 708. If any part of the process fails to be executed, at step 710, the Error Handler generates an appropriate error message.
At step 712, if necessary, the Command Analyzer translates the language that the command is received into a table language. In the embodiment shown, the command is translated from OQL to SQL. The translation is necessary when an application sends a command to the VFS in a language other than a table language that acts on a database. For example, an FTP client may request a "RENAME" command (which is an OQL command) to rename a file or folder on the VFS. The OQL command will be translated to a corresponding SQL command such as "UPDATE" with the appropriate arguments and be executed on the VFS. If the commands were received in a table language capable of acting on data stored in table structure format, such as SQL, however, there would be no translation.
The method of translation includes the Command Analyzer 605 analyzing whether the command requested is a DDL operation such as "Create" or "Alter" or "Drop" at step 714, or a DML operation such as "select" or " Insert" or "Update" or "Delete" at step 728. While performing DDL operations at step 714, the Audit Manager 635 checks for user privileges at step 716. In the event it is a DML operation, step 728, the Command Analyzer 605 checks for any application associated with the command, step 730. (The Command Analyzer utilizes the language translation table, described below when translating commands.) The Option Validator 625 checks for existence of the application, step 732, the Execution Engine executes the command on the application with appropriate parameters, step 734.
Those of skill in the art will appreciate that, although the depicted embodiment shows examples of translating a command from OQL to SQL, commands received by the VFS in any language may be translated into other table languages that are capable of acting on data stored in table structure format. Such languages are typically functionally rich and provide greater flexibility in the manner which data stored on the system may be acted upon, i.e., stored, modified, retrieved or deleted.
The Audit or Security Manager 635 ensures whether the user requesting the command has the necessary rights to execute the command. In the event the user does not have appropriate privileges, the Error Handler 710 triggers an error at step 710. If the user has been granted access to execute commands on the VFS, the Option Validator 625 proceeds to check the arguments specified in the command at step 720. If the Option Validator finds invalid options in the command, the Error Handler is triggered to generate an appropriate error at step 710. In the event the Option Validator 625 finds valid arguments within the command, the
File Translator 650 proceeds to execute the arguments, which may include storing data identified in the argument, step 724. The File Translator 650 stores the actual command argument object to a particular cell(s) or ROWID (a unique identifier for each ROW) in the data matrix 1462 on the media. Because the data is stored as a table structure, the data need not be translated for storage into the data matrix 1462 when provided the data is provided in a table structure format. The data is stored in an appropriate cell based on a unique hash that is generated. When applications are written using the VFS APIs they can specify data definition information in a data definition table. The data is stored at an appropriate location in the data matrix 1462 based on a unique hash. The File Translator also works backwards in translating data retrieved from the media to application specific data formats as per the command. The final storage of data on the media into a table strucutre format and, in one embodiment, an ODBC compliant database format, step 726, is performed by the Disk Agent 665. All disk related activities with respect to fetching records, reading, writing or updating data irrespective of the user is centrally managed by the Disk Agent 665. Other modules of the VFS, which need data of any file not present in the cache, interact with the Disk Agent 665 for persistence to or retrieval of data from the VFS. The Disk Agent 665 also manages secondary storage when primary storage, such as RAM, is exhausted, i.e., when the Resource Manager runs out of RAM and demands swap to store the data onto the media. When an operating system file system is replaced by the VFS, the Disk Agent 665 creates and initializes the long-term memory and creates metadata information to manage the VFS.
The Execution Engine 645 is equipped with the logic of executing operating system related functionality received from operating system related commands. It translates the user commands (with or without arguments) and performs the execution. The command, that maybe an internally fired SQL query, works likewise to a database query execution engine. The execution may result in some storage of data, which may be part of VFS data definition table. For example, a command like "CD" (change directory) is intended to change the current directory whereas a "MD" (MKDIR) command with arguments will result in the actual creation of a directory. The command for the creation of a directory is analogous to an "INSERT" statement in SQL. This command is interpreted by the VFS and checked for certain constraint validations. For example, checking for a directory with the same name in the same parent directory would not be permitted and generate an error. The Execution Engine 645 is responsible for execution of the SQL queries that have been generated.
To provide support for multiple languages to be used while communicating with the database, the Scripting and Parsing Engine 615 provide universal file system command interoperability across operating systems. For example, VFS provides support for VFS communicating with a legacy file system and applications or a VFS communicating with another VFS file system and applications built using VFS APIs. This is done by parsing and analyzing commands received from applications and translating them to queries that VFS can understand. Interoperability between VFS and legacy file systems is provided for by a file system translator 620. In one embodiment, interoperabilty is provided for numerous file systems including the New Technology File System (NTFS) from Microsoft and the High Performance File System (HPFS) from IBM.
At each step, if the operation fails, an appropriate error message is triggered and displayed on the screen. On completing the entire process, the system proceeds to return control to the VFS, step 740 and waits for the next client request.
For interfacing other machines or other file system to the VFS system, a Network Agent module 675 is supported, which exhibits functionality to carry user commands through the network stack across various protocol layers.
LANGUAGE TRANSLATION TABLE DESCRIPTION
Fig. 8 is a diagram and embodiment of a language translation table of the current invention depicting the interpretation of OQL commands as SQL queries and vice-versa.
Consider a Legacy file system 800, which is in communication with the VFS. The legacy file system uses primitive file and directory commands which need to be interpreted for accessing data on the VFS. This interpretation involves the use of translating an OS language, such as an Object Query Language (OQL), to a table language, such as a Structured Query Language (SQL) 805. A table including commands from the OS language and corresponding commands in a table language is provided below.
For example, the "DIR" command 810 with various options such as 7OS/OD/OA/ad" is interpreted as a "SELECT" command 815 specifying arguments for accessing a specific row or column. Similarly the file "COPY" command 820 with a source and target argument is interpreted after the OQL to SQL translation to either an "INSERT" or "UPDATE" statement 825. Similarly "DEL" commands for deleting a file 830 with the path of the filename as an argument is interpreted as a "DELETE" 835 operation in SQL with the specific ROWID or COLUMNID as an argument. Similarly RENAME command 840 translates to an "UPDATE" statement 845. The "SEARCH" or "FIND" command 850 translates to "SELECT" statement 855. A command for making a directory in a file system is usually executed as "MKDIR" or "MD" with the directory name as an argument 865 creates a directory entry in the root with respect to its current parent path. Hence an ""INSERT" query can create the directory entry but the interpretation of the command ("MKDIR" / "MD") into an "INSERT" query has to be fortified by a logical interpretation of the file attributes, which separates the file from a directory in a standard file system. Also failures in any such operations need to be tracked. For example if a directory or file with the same name exists in the same parent path, an error message is triggered as per normal operating system file system logic.
Fig. 9 shows another embodiment of a language translation table wherein a VFS is in communication with a legacy file system. A basic database query such as "Connect" 900, used with a usemame and password as arguments, grants access to all tables and records. On legacy files systems, individual records are stored as files and therefore an "OPEN" operation on is translated to a "CONNECT" command on VFS, it grants access to all records and tables. As soon as a successful "Connect" 900 is achieved by any VFS user, the user gets access to all the records on the file system, unless he has explicitly been excluded from viewing or manipulating certain records on the file system Similarly a "Close" command 920 translates to a "Disconnect" 915 query on the VFS and closes all open connections to the database. The Disk Agent 665 is responsible for handling other file operations or commands such as "Seek" 925, "Tell" 930, "Read" 935, "Write" 940 etc. When an operating system file system is replaced by the VFS, the Disk Agent 665 creates and initializes the long-term memory and creates metadata information to manage the VFS.
Fig 10 illustrates another language translation table of other legacy file system commands into a language, such as SQL, queries on the VFS. A file command 1000 such as "Copy" 1005 is translated to an "INSERT" 1010 query on the VFS. Similarly the file operation "Del" 1015 is translated to "DELETE" statement 1020 on the VFS. The file operation "REN" 1025 used for renaming is translated to an "UPDATE" query 1030. Also the file operation "MOVE" 1035 is translated to an "UPDATE" statement 1040.
Similarly the Directory operations 1085 such as "MD" with the directory name as arguments 1045 to make directory is translated to an "INSERT" statement 1050 on the VFS. Also the directory operation such as "CD <directory name>" 1055 to change directory is translated to a "SELECT" query 1060 on the VFS. Similarly the "RD" 1065 with the directory name as an argument to remove directory is translated to "DELETE" statement 1070 on the VFS. As depicted in the language translation table of Fig. 11 , the operations on the files are supported by the operating system. For a desktop operating system that is either a single or multi-user server operating system, the functionality varies in terms of security and monitoring shared resources. The shared resources generally are directly translated by the file system, that is data shared is always accessed through the file system and basic file input or output commands operate in shared mode for a server based file system.
Whenever the entire file system is simulated in a database, the commands like "Open" 1100 and "fopen" 1105 can emulate to "CONNECT" on the VFS. The corresponding commands in a shared file system are "sopen" or "_fopen" 1 110. Similarly "Close" command 1115 or "fclose" command 1120 on a legacy file system emulates to a "DISCONNECT" on the VFS. The corresponding commands in the case of a shared file system are "fsclose" 1120 or "_sclose" 1125. The "Read" 1130 or "fread" 1135 and "Write" 1140 and "fwrite" 1145 are handled in a similar manner. Commands such as "Seek" 1150 and "fseek" 1155 can manipulate the file handle, the identifier of a file, with respect to the block origin. Commands like "tell" 1160 and "ftell" 1165 can return the file offset with respect to block offset within the file system database. The translation of the actual file data can be managed into database tuples as any normal database record. The number of files, which the user has access is exposed through a database view which prevents illegal access of other records that is files) in the table.
Any operating system typically has file classification associated with a predefined functionality or associated application that is certain files are Binary executable such as ".exe" files, batch executable such as ".bat" files or ".bmp" linked with a painting application like paintbrush. Whenever user tries to execute these files the VFS spawns / forks a process which performs this execution irrespective of operating system on which the VFS is implemented. The associations of the file functionality or any associated application are saved in the metadata of the VFS database.
The scope of the invention is not to be limited by the embodiments described above. Those of skill in the art appreciate that other embodiments that are not described above are within the spirit of the invention. The invention shall be given its full scope as provided in the claims below.

Claims

CLAIMSclaim:
1. A method for executing a command, the method comprising:
receiving data in a first format from an application; translating the data from the first format to a table structure format if the first format is different from the table structure format; storing the data in the table structure format; and, executing a command in a table language to act on the data that is stored in the table structure format.
2. The method of Claim 1 , further comprising: receiving the command in a first language; translating the command into a second language if the first language is not a table language, the second language being a table language; wherein the executing step further comprises executing the command in the first language if the command was not translated into the second language and executing the command in the second language if the command was translated into the second laguage.
3. The method of Claim 2, wherein the translating the command step further comprises translating the command from an Object Query Language (OQL) to a S Querry Language (SQL), wherein the first language is the OQL and the second language is the SQL.
4. The method of Claim 1 , wherein the receiving step further comprises receiving the data in a first format that includes a corresponding data definition table.
5. The method of Claim 4, further comprising providing a data definition table corresponding to the data at the application and sending the data with the corresponding data definition table to a file system, wherein the receiving step further comprises receiving the data with the corresponding data definition table at the file system.
6. The method of Claim 4, further comprising directly using first data in a second application and a first corresponding data definition table of a first application; wherein the receiving step further comprises receiving the first data in the first format, the first data including the first correspondnig data definition table.
7. The method of Claim 1 , wherein the storing step further comprises storing the data in a data matrix on a long term memory,
8. The method of Claim 7, wherein the storing step further comprises storing all of the data in a single cell of the data matrix.
9. The method of Claim 7, wherein the storing step further comprises storing the file data in more than one cell of the data matrix.
10. The method of Claim 1 , wherein the translating step further comprises translating the data from the first format to an Open Database Connectivity (ODBC) compliant format.
11. A system for executing a command, the system comprising:
a memory including a file system, the file system comprising a data matrix including cells, the cells including formatted data stored in a table structure format; and, a processor in communication with the memory, the processor configured to translate data received from an application in a first format to the table structure format to provide the formatted data, to not translate the data if the first format is a table structure format so that formatted data is maintained, to store the formatted data in the file system, and to execute the formatted data in a table language.
12. The system of Claim 11 , wherein the processor is further configured to receive a command in a first language and to translate the command into a second language if the first language is not a table language, the second language being a table language.
13. The system of Claim 12, wherein the memory further comprises a language translation table, the language translation table including commands in the first language and corresponding commands in the table language.
14. The system of Claim 13, wherein the first language is an Object Query Language (OQL) and the table language is a Structured Query Language (SQL).
15. The system of Claim 11 , wherein the file system further comprises a data definition table corresponding to the application, the data definition table including information about the formatted data.
16. The system of Claim 15, wherein the formatted data is categorized into types and the definition table includes data definition sub-structures (DDSS), each DDSS corresponding to formatted data of the type corresponding to the DDSS.
17. A virtual file system for executing a command in a table language to act on data that is stored in a table structure format when an application sends the command, the virtual file system comprising: a disk agent, a file translator, and an execution engine in communication with one another, wherein, when the application sends data to the virtual file system in a first format: (a) the disk agent instructs a processor and a memory of a system to receive the data in the first format from the application; (b) the file translator instructs the processor to translate the data from the first format to the table structure format if the first format is not the table structure format; (c) the disk agent instructs the processor to store the data in the table structure format to the memory; and, when the application sends the command to the virtual file system, (d) the execution engine instructs the processor to execute the command in the table language to act on the data stored in the table structure format; wherein the virtual file system is stored in a media and, when the media is used with the system, the virtual file system is transferred to the memory of the system, the system including the memory and the processor.
18. The virtual file system of Claim 17, further comprising a command analyzer, wherein, when the application sends the command to the virtual file system, the command being sent in a first language: (a) the command analyzer instructs the processor and the memory to, receive the command in the first language; and, (b) the command analyzer instructs the processor to translate the command from the first language to a table language if the first language is not the table language.
19. The virtual file system of Claim 18, wherein the command analyzer includes a languag translation table, the language translation table including commands in the first language and corresponding commands in the table language.
20. The virtual file system of Claim 17, wherein the first language is OQL and the table language is SQL.
PCT/IN2004/000110 2003-04-21 2004-04-21 A novel file system WO2004109424A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN395MU2003 2003-04-21
IN395/MUM/2003 2003-04-21

Publications (2)

Publication Number Publication Date
WO2004109424A2 true WO2004109424A2 (en) 2004-12-16
WO2004109424A3 WO2004109424A3 (en) 2005-05-19

Family

ID=33495855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2004/000110 WO2004109424A2 (en) 2003-04-21 2004-04-21 A novel file system

Country Status (1)

Country Link
WO (1) WO2004109424A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060189323A1 (en) * 2005-02-24 2006-08-24 Masafumi Usuda Radio resource control method, radio base station, and radio network controller

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5432942A (en) * 1993-06-10 1995-07-11 The United States Of America As Represented By The Secretary Of The Navy Data structure extraction, conversion and display tool
WO2002039322A1 (en) * 2000-11-09 2002-05-16 Accenture L.L.P. Method and system for translating data associated with a relational database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5432942A (en) * 1993-06-10 1995-07-11 The United States Of America As Represented By The Secretary Of The Navy Data structure extraction, conversion and display tool
WO2002039322A1 (en) * 2000-11-09 2002-05-16 Accenture L.L.P. Method and system for translating data associated with a relational database

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060189323A1 (en) * 2005-02-24 2006-08-24 Masafumi Usuda Radio resource control method, radio base station, and radio network controller

Also Published As

Publication number Publication date
WO2004109424A3 (en) 2005-05-19

Similar Documents

Publication Publication Date Title
US11016932B2 (en) Systems, methods, and apparatuses for simplifying filesystem operations utilizing a key-value storage system
US10824673B2 (en) Column store main fragments in non-volatile RAM and the column store main fragments are merged with delta fragments, wherein the column store main fragments are not allocated to volatile random access memory and initialized from disk
US7797477B2 (en) File access method in a storage system, and programs for performing the file access
US7533136B2 (en) Efficient implementation of multiple work areas in a file system like repository that supports file versioning
US9405487B2 (en) Media aware distributed data layout
US8433684B2 (en) Managing data backup of an in-memory database in a database management system
US7308463B2 (en) Providing requested file mapping information for a file on a storage device
US7676481B2 (en) Serialization of file system item(s) and associated entity(ies)
JP4944008B2 (en) System, method and computer-accessible recording medium for searching efficient file contents in a file system
US20120271814A1 (en) Optimization of queries on a repository based on constraints on how the data is stored in the repository
US20060059204A1 (en) System and method for selectively indexing file system content
US20030204510A1 (en) Run-time access techniques for database images
US20010018684A1 (en) System and method for accessing non-relational data by relational access methods
US8819088B2 (en) Implementing storage management functions using a data store system
Lovelace et al. VSAM demystified
US7356493B2 (en) Apparatus and method for passing information between catalogs in a computer operating system
US6735765B1 (en) Sharing data between operating systems
US7844596B2 (en) System and method for aiding file searching and file serving by indexing historical filenames and locations
US20070174360A1 (en) Storage system embedding database
EP4016312B1 (en) Data operations using a cache table in a file system
US7890456B2 (en) Sharing of database objects
US20070299890A1 (en) System and method for archiving relational database data
WO2004077219A2 (en) System and method of mapping patterns of data, optimising disk read and write, verifying data integrity across clients and servers of different functionality having shared resources
WO2004109424A2 (en) A novel file system
US7809766B2 (en) Writable shared database objects

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase