WO2005081126A2 - Auditing and tracking changes of data and code in spreadsheets and other documents - Google Patents

Auditing and tracking changes of data and code in spreadsheets and other documents Download PDF

Info

Publication number
WO2005081126A2
WO2005081126A2 PCT/GB2005/000529 GB2005000529W WO2005081126A2 WO 2005081126 A2 WO2005081126 A2 WO 2005081126A2 GB 2005000529 W GB2005000529 W GB 2005000529W WO 2005081126 A2 WO2005081126 A2 WO 2005081126A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
changes
spreadsheet
monitoring
Prior art date
Application number
PCT/GB2005/000529
Other languages
French (fr)
Other versions
WO2005081126A3 (en
Inventor
Andrew Reeves
James Culverwell
Aaron Wittman
Original Assignee
Cluster Seven Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cluster Seven Limited filed Critical Cluster Seven Limited
Priority to JP2006552693A priority Critical patent/JP5312742B2/en
Priority to EP05708347A priority patent/EP1719060A2/en
Publication of WO2005081126A2 publication Critical patent/WO2005081126A2/en
Publication of WO2005081126A3 publication Critical patent/WO2005081126A3/en
Priority to US11/461,087 priority patent/US9734139B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/197Version control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Definitions

  • This invention relates to auditing and tracking changes of data and code in spreadsheets and other documents.
  • a spreadsheet application is a computer program that allows a user to manipulate two- dimensional and three-dimensional arrays of data.
  • a user of a spreadsheet application is presented with a two-dimensional array of cells, each cell being capable of storing an item of numeric data, text, a software object, or a formula.
  • a three-dimensional spreadsheet has several of such arrays having a definite order. Where a cell contains a formula, the display normally shows the result of the calculation that the formula defines.
  • the formula can include amongst it input variables the value of another cell within the spreadsheet or within another spreadsheet. Of course, the value of the other cell may be the result of calculation of a formula within the other cell.
  • Such linkage between cells can be extended to an arbitrary degree of complexity.
  • Spreadsheets were conceived as tools to be used for performing quick calculations on personal data.
  • data and functionality that are critical to the operations of an organisation would traditionally have been stored in databases supporting computer applications that have been developed by programmers to perform specific, controlled actions on the data.
  • Computer spreadsheets have become ubiquitous in many organisations as a result of the growth in power of personal computers. Therefore, in practice, this traditional model of how data is managed is no longer followed in many organisations.
  • Spreadsheets that were hurriedly put together to perform a calculation that the programmed database lacked have become essential tools central to the work of their creators, and, effectively, central to the operation of the business. In the present situation, the features of spreadsheets that make them very attractive to their users become unwelcome from the point of view of the organisation as a whole.
  • a spreadsheet can be changed very easily.
  • the interconnected nature of the cells in a spreadsheet can mean that the consequences of such a change can be very difficult to foresee. This is especially so in cases where the spreadsheet is referenced by another spreadsheet, possibly developed by another person. It is also not possible to determine the content of a spreadsheet at any given moment, so that the reasons behind a conclusion being drawn from the content of a spreadsheet may not be traceable.
  • the power of modern personal computers is such as to make practicable implementation of a scripting language in spreadsheet applications to perform calculations and actions of arbitrary complexity. Yet this code is not subject to version control and change tracking familiar to programmers in traditional programming languages.
  • An aim of this invention is to provide a mechanism whereby actions performed upon a spreadsheet can be recorded in such a manner that the state of the spreadsheet at any particular time can be recovered and the evolution between states can be interrogated. It is a further aim of the invention that the inherent flexibility of the spreadsheet, which makes it so attractive, is not lost.
  • this invention provides a system for monitoring and auditing data in one or more spreadsheets, the system comprising a file store within which files containing spreadsheet data can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
  • the file store may be constituted in many different forms in different embodiments of the invention, and it may include many different components. For example, it may include one or more of a file server and a workstation computer. Components of the file store may be at diverse locations and interconnected by local-area or wide-area network links. Likewise, the monitoring means may access the file store using local-area or wide-area network links. The monitoring means most preferably includes a file location module that searches the file store for spreadsheets that are to be monitored.
  • the monitoring means includes comparison means operative to compare a file in the file store (the current file) with a previous version of that file. This enables the system to determine the differences between success and saved versions of the same file. More specifically, the comparison means may compare each cell in the current file with a corresponding cell in the previous version and, if it is determined that the content of the cell has changed, create a record of that change. Thus, changes can be recorded on a cell-by-cell basis. The record thereby created most advantageously includes data that characterises the nature of the change. Such a record can be inspected to obtain a qualitative understanding of the nature of the changes made to the spreadsheet file.
  • the nature of the change may be classified as being one or more of: data added, data changed, data deleted, function added, function changed, function deleted, data to function, function to data, cell error, function recalculation, [script changes, external reference changes, named range changes, link changes, password changes, sheet added, sheet deleted, sheet renamed, spreadsheet renamed, spreadsheet deleted, spreadsheet added. It will be understood that not all embodiments may include all of these classifications and that some embodiments may include further classifications not listed here.
  • the record may include additional data, such as a time stamp that indicates when the change was made and the identity of a user who made the change.
  • a file being monitored is parsed and converted to a more easily processable form, for example a text-based format (such as, XML).
  • XML has the further advantage that there are many tools available to process it and that it can be represented in both a text-based, serialised form and a hierarchical form.
  • this invention provides a system for monitoring and auditing program code in one or more documents, the system comprising a file store within which files containing documents can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in code contained in the documents within the file store, and to record the changes in the database.
  • the database contains a history of changes made to the code such that the development and evolution of the code over a period of time can be monitored and audited.
  • Embodiments of in this aspect of the invention can work with documents of different types that contain code.
  • these may include spreadsheets, word processor documents, database files and so forth.
  • this invention provides a computer program product, executable on computer hardware, to constitute monitoring means within an embodiment of the first aspect or second aspect of the invention.
  • this invention provides a server for connection to the network for monitoring and auditing data in one or more spreadsheets, the network including a file store within which files containing spreadsheet data can be stored, and a database, the server being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
  • Such server may constitute monitoring means in the first aspect of the invention.
  • embodiment of the invention does not interfere with the operation of the clients and that, as far as possible, the presence of the embodiment is entirely transparent to users of the clients.
  • the embodiment of the invention operate asynchronously of the clients on the network. Specifically, operation of the clients should not be blocked while the embodiment processes files saved within the file store.
  • Figure 1 is a high-level diagrammatic overview of an embodiment of the invention
  • Figure 2 illustrates a process of maintaining data in the embodiment of Figure 1;
  • Figure 3 illustrates a process of capturing files in the embodiment of Figure 1;
  • Figure 4 illustrate a component of the embodiment of Figure 1
  • Figure 5 is an overview of a network of computers within an enterprise in which the present invention is embodied
  • FIG. 6 is a block diagram that shows the principal functional components of an embodiment of the invention.
  • Figure 7 is a diagram illustrating a base class from which other task classes are derived;
  • Figure 8 is a diagram illustrating at various task classes;
  • Figure 9 is a diagram illustrating the operation of the file retrieve class
  • Figure 10 is a diagram illustrating the operation of the transform class
  • Figure 11 is a diagram illustrating the operation of the file retrieve class
  • Figure 12 is a diagram illustrating the operation of the macro decomposer class
  • Figure 13 is a diagram illustrating the operation of the compare class
  • Figure 14 is a diagram illustrating the operation of the bulk load class
  • Figure 15 is a diagram illustrating the operation of the XML storage class.
  • Figures 16 to 19 are screenshots showing various displays generated during operation of the embodiment.
  • the embodiment is a system that works in conjunction with an existing business desktop and networking environment.
  • the system identifies the existence of spreadsheet files within the network, and then captures those files selected by the business for management.
  • the system then builds a database of the content of those files.
  • the system monitors the network for events that occur in the managed file.
  • trigger events including (but not exclusively) a file save of a spreadsheet
  • the system will retrieve a copy of the file, decompose it to elemental parts and compare this with the original held in the database.
  • the system then saves changes made in the database. Where the file has been renamed, deleted or moved, it records this in the database.
  • the system allows the business to view data collected in the database centrally. This system can aggregate and analyse the data using a portal employing existing technologies.
  • the system can access multiple network domains to do its work in order to maintain the files under management.
  • the system converts the contents of the files into a representation in the Extensible Markup Language (XML) and use this format to make comparisons.
  • XML Extensible Markup Language
  • the system works at the most granular level to save files, track and identify and capture changes to files.
  • the database is populated at this very granular level.
  • the business user can apply their own reporting tools and analysis choices to the data.
  • the invention does not restrict what is done with the data.
  • the system can work in domains where access has been given. It will seek out files, events and post event versions of the files to build a database.
  • the system scans targeted network environments, and pulls out the list of all files to be monitored.
  • the invention identifies the existence of files, where new files have been created since the last stand and where files have been deleted.
  • the scan generates a report enabling either an automatic software tool or the business to trigger the capture of files for either the start of the process ( Figure 1) or the maintenance of files ( Figure 2).
  • This embodiment of the invention is implemented within a network of computers.
  • this network there is an arbitrarily large number of clients, which are personal computers upon which users run spreadsheet applications to develop and use spreadsheets. Users may also store their personal spreadsheet files upon these computers. Files can be monitored on any drive linked to the network. Usually, but not always, important spreadsheets are located on central file servers. Also on the network is a monitoring server, which is responsible for monitoring and recording activities performed by users upon the spreadsheets.
  • a network of computers within an enterprise comprises an arbitrarily large number of client computers 10.
  • Each client computer 10 is used by a user to create personal documents, including spreadsheets.
  • each client computer 10 has a single user who will have their own personal preferences and working practices associated with their spreadsheets.
  • Each client computer 10 is connected to a network 12. For the purposes of illustration, this is shown in Figure 1 as an integrated network. However, in practical embodiments, it is likely to include many subnets, and local- area and wide-area network links. Also connected to the network 12 is one are more file server 14. Depending on network policies, a user may choose to store files either on their client machine 10 or on a file server 14. Together, local storage provided by each client computer 10 and the or each file server 14 constitute a diverse file store.
  • a monitoring server 16 is connected to the network 12 such that it has access to the entire file store, or to part of the file store, according to the policy of its operator. It is within the monitoring server 16 that the processes provided by the invention are performed.
  • the configuration of the monitoring server 16 is highly dependent upon the nature of the network as a whole.
  • the monitoring server 16 must be capable of performing its task with minimal delay upon operation of software executed on the clients 10.
  • the monitoring server 16 may be a monolithic machine while in other cases it may be a cluster of machines controlled by servers 44 to enable parallel processing of multiple documents.
  • FIG. 6 the entirety of spreadsheet files within the file store is shown at 20.
  • the other items shown in Figure 6 are logical components of the monitoring server 16. These components are as follows: a watching service 22, a controller service 24 and, and meta store database 26, and an asynchronous framework 30 that includes a monitoring service 32 and a process service 34. Operation of the system as whole is controlled by passing task messages between the various logical components. This has the advantage that it enables the system to be scaled by adding further machines into a cluster and distributing task messages amongst them. It also allows the monitoring server to process the spreadsheet files 20 asynchronously of the clients, thereby ensuring a minimum of interference with the operation of their users' spreadsheet programs.
  • the function of the watching service 22 is to monitor the spreadsheet files 20 and to raise an event when any of them changes.
  • the file is read, its current contents compared with the contents of the file when it was last read, and changes which are identified are then stored in a database.
  • every time the file in the file store changes to a record is made. It is therefore possible, starting with the file as it is in the file store, to apply the recorded changes in reverse and thereby arrive at the file as it was at an earlier stage in its development. Provision of the watching service 22 obviates the need to provide hooks into the spreadsheet programs to trigger an action upon a file been changed.
  • this embodiment is concerned with processing spreadsheets created using Microsoft Excel.
  • This application stores spreadsheets in files that conventionally have a filename extension ".XLS” (conveniently referred to as “XLS files”). These are binary files.
  • XLS files are binary files.
  • the principles of the invention could equally well be applied to spreadsheets created using other applications irrespective of how these applications store spreadsheet data. Since it is notoriously difficult to perform content analysis on binary files, the approach taken by this embodiment is to first convert XLS files into a representation in the Extensible Markup Language (XML).
  • Other spreadsheet programs notably OpenOffice.org Calc use XML as their native file format. Therefore, the conversion step can be omitted when processing these files.
  • this embodiment is implemented on a cluster of computers. Operations performed by the server are carried out as discrete tasks. Task objects are created and are placed in a task queue. When a task reaches the head of the queue, it is processed by the first available machine in the cluster.
  • FIG. 3 illustrates the structure of the parent class from which all tasks are derived.
  • Four main methods are exposed by every task object. These are: Operate Task, Start Task, Run Task and End Task.
  • Operate Task is the method for the asynchronous framework to launch the object. It also makes a log entry into tracing code. It takes as an input a Transaction State Object (TSO) which contains the state variables for a task to use within its processing.
  • TSO Transaction State Object
  • the Start Task method sets up running configuration variables. Run Task performs all functions for completing the task. End Task sets all finishing operations.
  • Transform this class is responsible for taking an XLS file and deconstructing the file into XML worksheets.
  • the Transform class calls the Comparison class and Macro class to perform all delta operations. This class will deconstruct all named ranges and detect any changes. This class also spawns the Compare and Macro Decomposer tasks into the asynchronous framework.
  • Compare this class is responsible for calculating and classifying the state at the most recent time point and a description of the changes, known as "deltas" between two time points of the XML.
  • This class classifies all change types and cell types and tags each delta change with the associated change. This class uses the following algorithm: • Position the pointer at the first cell from the previous XML file and current file XML file.
  • this class is responsible for storing the XML file created from the transformation of the document. This class serves to compress the current generated XML file, remove the previous file and upload the new file to the datastore.
  • Macro Decomposer this class calculates the differences between two programmatic scripts at two time points.
  • the capture script may be integrated into an existing source code control system 40.
  • the embodiment provides a Web-based interface through which a user can gain access to data within the database.
  • This provides a central point from which all spreadsheet activity can be monitored.
  • a process running on the monitoring server 16 generates queries that are applied to the database and from the results of those queries generates pages that can be served to a Web browser by a Web server 38.
  • the range of queries and reports that can be generated is essentially without limit. These are tailored to the specific requirements of a particular user of the system.
  • the same interface can also be used for configuration and maintenance of the system. Provision of a Web-based interface to a database is a routine matter and the details will not therefore be described here further.
  • alerts based on the result of the queries. For example, an administrator may define a range of prohibited actions for some parts of a specific spreadsheet, or may wish to be alerted when the value of a particular cell moves outside of a predetermined range.
  • the system may generate and dispatch an e-mail message to a predetermined address when it detects that an alert should be issued.
  • Figure 16 is a screen that allows a user to see which files are being monitored and, where files are linked, the hierarchy of linked files.
  • Figure 17 shows a screen that allows the user to monitor how the value of a particular cell within a spreadsheet changes over time. Value is represented graphically in this screen.
  • Figure 18 shows a screen that presents a user with a summary of changes that have been made to the spreadsheet or over time. The nature of each change and the person who made it are identified.
  • Figure 19 illustrates the source code control aspect of the invention.
  • the right-hand pane shows a segment of program code in its current version, while the left- hand pane shows the previous version of the code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A system, server and computer program product for monitoring and auditing data in one or more spreadsheets are disclosed. The System comprises a file store within which files containing spreadsheet data can be stored, a database, and monitoring means. The monitoring means operates to detect changes in spreadsheet data within the file store, and to record the changes in the database. Also disclosed are a system, server and a computer program product for monitoring and auditing program code within one or more documents, such as spreadsheet documents. These aspects of the invention are, in typical embodiments, provided in combination with one another.

Description

Auditing and tracking changes of data and code in spreadsheets and other documents
This invention relates to auditing and tracking changes of data and code in spreadsheets and other documents.
A spreadsheet application is a computer program that allows a user to manipulate two- dimensional and three-dimensional arrays of data. A user of a spreadsheet application is presented with a two-dimensional array of cells, each cell being capable of storing an item of numeric data, text, a software object, or a formula. A three-dimensional spreadsheet has several of such arrays having a definite order. Where a cell contains a formula, the display normally shows the result of the calculation that the formula defines. The formula can include amongst it input variables the value of another cell within the spreadsheet or within another spreadsheet. Of course, the value of the other cell may be the result of calculation of a formula within the other cell. Such linkage between cells can be extended to an arbitrary degree of complexity.
Spreadsheets were conceived as tools to be used for performing quick calculations on personal data. In contrast, data and functionality that are critical to the operations of an organisation would traditionally have been stored in databases supporting computer applications that have been developed by programmers to perform specific, controlled actions on the data. Computer spreadsheets have become ubiquitous in many organisations as a result of the growth in power of personal computers. Therefore, in practice, this traditional model of how data is managed is no longer followed in many organisations. Spreadsheets that were hurriedly put together to perform a calculation that the programmed database lacked have become essential tools central to the work of their creators, and, effectively, central to the operation of the business. In the present situation, the features of spreadsheets that make them very attractive to their users become unwelcome from the point of view of the organisation as a whole. A spreadsheet can be changed very easily. However, the interconnected nature of the cells in a spreadsheet can mean that the consequences of such a change can be very difficult to foresee. This is especially so in cases where the spreadsheet is referenced by another spreadsheet, possibly developed by another person. It is also not possible to determine the content of a spreadsheet at any given moment, so that the reasons behind a conclusion being drawn from the content of a spreadsheet may not be traceable. Moreover, the power of modern personal computers is such as to make practicable implementation of a scripting language in spreadsheet applications to perform calculations and actions of arbitrary complexity. Yet this code is not subject to version control and change tracking familiar to programmers in traditional programming languages.
Clearly, this situation is unacceptable in a large organisation since it renders the quality control of data and functionality impossible, and provides an obstacle to creation of auditable business activity (such as preparation of accounts) within the business that are required as a deterrent to financial mismanagement.
An aim of this invention is to provide a mechanism whereby actions performed upon a spreadsheet can be recorded in such a manner that the state of the spreadsheet at any particular time can be recovered and the evolution between states can be interrogated. It is a further aim of the invention that the inherent flexibility of the spreadsheet, which makes it so attractive, is not lost.
From a first aspect, this invention provides a system for monitoring and auditing data in one or more spreadsheets, the system comprising a file store within which files containing spreadsheet data can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
By storing suitable data within the database, it is possible to construct a complete history of all changes made to an individual spreadsheet. The file store may be constituted in many different forms in different embodiments of the invention, and it may include many different components. For example, it may include one or more of a file server and a workstation computer. Components of the file store may be at diverse locations and interconnected by local-area or wide-area network links. Likewise, the monitoring means may access the file store using local-area or wide-area network links. The monitoring means most preferably includes a file location module that searches the file store for spreadsheets that are to be monitored.
In a typical embodiment, the monitoring means includes comparison means operative to compare a file in the file store (the current file) with a previous version of that file. This enables the system to determine the differences between success and saved versions of the same file. More specifically, the comparison means may compare each cell in the current file with a corresponding cell in the previous version and, if it is determined that the content of the cell has changed, create a record of that change. Thus, changes can be recorded on a cell-by-cell basis. The record thereby created most advantageously includes data that characterises the nature of the change. Such a record can be inspected to obtain a qualitative understanding of the nature of the changes made to the spreadsheet file. The nature of the change may be classified as being one or more of: data added, data changed, data deleted, function added, function changed, function deleted, data to function, function to data, cell error, function recalculation, [script changes, external reference changes, named range changes, link changes, password changes, sheet added, sheet deleted, sheet renamed, spreadsheet renamed, spreadsheet deleted, spreadsheet added. It will be understood that not all embodiments may include all of these classifications and that some embodiments may include further classifications not listed here. The record may include additional data, such as a time stamp that indicates when the change was made and the identity of a user who made the change.
It is well recognised that files stored in a binary format are inherently difficult to compare with one another. Therefore, in particularly effective embodiments of the invention, a file being monitored is parsed and converted to a more easily processable form, for example a text-based format (such as, XML). XML has the further advantage that there are many tools available to process it and that it can be represented in both a text-based, serialised form and a hierarchical form.
In parallel with their development as calculation tools, spreadsheet applications have incorporated ever-more powerful scripting capability. This has evolved to the extent that spreadsheet applications can now be considered to be powerful programming environments. For example, in the case of Microsoft Excel, the programming language Visual Basic for Applications (VBA) is provided as a scripting language that is capable of implementing programmes of high complexity. As with spreadsheet data, this offers a user the ability to generate programmes quickly and flexibly. However, this flexibility means that there is little control over the development of the software. In particular, there is no version control or source code control system so it is not possible to see why changes have been made to program code after the event.
It is a further aim of this invention to provide means for managing code contained within a document such as a spreadsheet without limitation to the flexibility offered to a user.
From another aspect, this invention provides a system for monitoring and auditing program code in one or more documents, the system comprising a file store within which files containing documents can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in code contained in the documents within the file store, and to record the changes in the database.
As with embodiment, which is concerned with spreadsheet files, the database contains a history of changes made to the code such that the development and evolution of the code over a period of time can be monitored and audited.
Embodiments of in this aspect of the invention can work with documents of different types that contain code. For example, these may include spreadsheets, word processor documents, database files and so forth.
In many cases, both of these aspects of the invention will be provided by a common system. From another aspect, this invention provides a computer program product, executable on computer hardware, to constitute monitoring means within an embodiment of the first aspect or second aspect of the invention.
From a further aspect, this invention provides a server for connection to the network for monitoring and auditing data in one or more spreadsheets, the network including a file store within which files containing spreadsheet data can be stored, and a database, the server being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
Such server may constitute monitoring means in the first aspect of the invention.
In all of the above cases, it is preferable that operation of embodiment of the invention does not interfere with the operation of the clients and that, as far as possible, the presence of the embodiment is entirely transparent to users of the clients. To this end, it is highly preferable that the embodiment of the invention operate asynchronously of the clients on the network. Specifically, operation of the clients should not be blocked while the embodiment processes files saved within the file store.
An embodiment of the invention will now be described in detail, by way of example, and with reference to the accompanying drawings, in which:
Figure 1 is a high-level diagrammatic overview of an embodiment of the invention;
Figure 2 illustrates a process of maintaining data in the embodiment of Figure 1;
Figure 3 illustrates a process of capturing files in the embodiment of Figure 1;
Figure 4 illustrate a component of the embodiment of Figure 1;
Figure 5 is an overview of a network of computers within an enterprise in which the present invention is embodied;
Figure 6 is a block diagram that shows the principal functional components of an embodiment of the invention;
Figure 7 is a diagram illustrating a base class from which other task classes are derived; Figure 8 is a diagram illustrating at various task classes;
Figure 9 is a diagram illustrating the operation of the file retrieve class;
Figure 10 is a diagram illustrating the operation of the transform class;
Figure 11 is a diagram illustrating the operation of the file retrieve class;
Figure 12 is a diagram illustrating the operation of the macro decomposer class;
Figure 13 is a diagram illustrating the operation of the compare class;
Figure 14 is a diagram illustrating the operation of the bulk load class;
Figure 15 is a diagram illustrating the operation of the XML storage class; and
Figures 16 to 19 are screenshots showing various displays generated during operation of the embodiment.
An overview of the embodiment.
As shown in Figure 1, the embodiment is a system that works in conjunction with an existing business desktop and networking environment. In operation, the system identifies the existence of spreadsheet files within the network, and then captures those files selected by the business for management. The system then builds a database of the content of those files. Once the first database is built, the system monitors the network for events that occur in the managed file. At selected trigger events, including (but not exclusively) a file save of a spreadsheet, the system will retrieve a copy of the file, decompose it to elemental parts and compare this with the original held in the database. The system then saves changes made in the database. Where the file has been renamed, deleted or moved, it records this in the database. The system allows the business to view data collected in the database centrally. This system can aggregate and analyse the data using a portal employing existing technologies.
As shown in Figure 2, the system can access multiple network domains to do its work in order to maintain the files under management. The system converts the contents of the files into a representation in the Extensible Markup Language (XML) and use this format to make comparisons. The system works at the most granular level to save files, track and identify and capture changes to files. The database is populated at this very granular level. The business user can apply their own reporting tools and analysis choices to the data. The invention does not restrict what is done with the data.
As shown in Figure 3, the system can work in domains where access has been given. It will seek out files, events and post event versions of the files to build a database.
As shown in Figure 4, the system scans targeted network environments, and pulls out the list of all files to be monitored. The invention identifies the existence of files, where new files have been created since the last stand and where files have been deleted. The scan generates a report enabling either an automatic software tool or the business to trigger the capture of files for either the start of the process (Figure 1) or the maintenance of files (Figure 2).
The embodiment in greater detail.
This embodiment of the invention is implemented within a network of computers. In this network there is an arbitrarily large number of clients, which are personal computers upon which users run spreadsheet applications to develop and use spreadsheets. Users may also store their personal spreadsheet files upon these computers. Files can be monitored on any drive linked to the network. Usually, but not always, important spreadsheets are located on central file servers. Also on the network is a monitoring server, which is responsible for monitoring and recording activities performed by users upon the spreadsheets.
As shown in Figure 5, a network of computers within an enterprise comprises an arbitrarily large number of client computers 10. Each client computer 10 is used by a user to create personal documents, including spreadsheets. Typically, each client computer 10 has a single user who will have their own personal preferences and working practices associated with their spreadsheets.
Each client computer 10 is connected to a network 12. For the purposes of illustration, this is shown in Figure 1 as an integrated network. However, in practical embodiments, it is likely to include many subnets, and local- area and wide-area network links. Also connected to the network 12 is one are more file server 14. Depending on network policies, a user may choose to store files either on their client machine 10 or on a file server 14. Together, local storage provided by each client computer 10 and the or each file server 14 constitute a diverse file store.
A monitoring server 16 is connected to the network 12 such that it has access to the entire file store, or to part of the file store, according to the policy of its operator. It is within the monitoring server 16 that the processes provided by the invention are performed. The configuration of the monitoring server 16 is highly dependent upon the nature of the network as a whole. The monitoring server 16 must be capable of performing its task with minimal delay upon operation of software executed on the clients 10. In some cases, the monitoring server 16 may be a monolithic machine while in other cases it may be a cluster of machines controlled by servers 44 to enable parallel processing of multiple documents. These will now be described in greater detail.
In Figure 6, the entirety of spreadsheet files within the file store is shown at 20. The other items shown in Figure 6 are logical components of the monitoring server 16. These components are as follows: a watching service 22, a controller service 24 and, and meta store database 26, and an asynchronous framework 30 that includes a monitoring service 32 and a process service 34. Operation of the system as whole is controlled by passing task messages between the various logical components. This has the advantage that it enables the system to be scaled by adding further machines into a cluster and distributing task messages amongst them. It also allows the monitoring server to process the spreadsheet files 20 asynchronously of the clients, thereby ensuring a minimum of interference with the operation of their users' spreadsheet programs.
The function of the watching service 22 is to monitor the spreadsheet files 20 and to raise an event when any of them changes. Upon detection of a change, the file is read, its current contents compared with the contents of the file when it was last read, and changes which are identified are then stored in a database. Thus, every time the file in the file store changes to a record is made. It is therefore possible, starting with the file as it is in the file store, to apply the recorded changes in reverse and thereby arrive at the file as it was at an earlier stage in its development. Provision of the watching service 22 obviates the need to provide hooks into the spreadsheet programs to trigger an action upon a file been changed.
In general, this embodiment is concerned with processing spreadsheets created using Microsoft Excel. This application stores spreadsheets in files that conventionally have a filename extension ".XLS" (conveniently referred to as "XLS files"). These are binary files. However, the principles of the invention could equally well be applied to spreadsheets created using other applications irrespective of how these applications store spreadsheet data. Since it is notoriously difficult to perform content analysis on binary files, the approach taken by this embodiment is to first convert XLS files into a representation in the Extensible Markup Language (XML). Other spreadsheet programs, notably OpenOffice.org Calc use XML as their native file format. Therefore, the conversion step can be omitted when processing these files.
As has previously been discussed, this embodiment is implemented on a cluster of computers. Operations performed by the server are carried out as discrete tasks. Task objects are created and are placed in a task queue. When a task reaches the head of the queue, it is processed by the first available machine in the cluster.
Figure 3 illustrates the structure of the parent class from which all tasks are derived. Four main methods are exposed by every task object. These are: Operate Task, Start Task, Run Task and End Task. Operate Task is the method for the asynchronous framework to launch the object. It also makes a log entry into tracing code. It takes as an input a Transaction State Object (TSO) which contains the state variables for a task to use within its processing. The Start Task method sets up running configuration variables. Run Task performs all functions for completing the task. End Task sets all finishing operations.
The principal classes derived from that class illustrated in Figure 3 are themselves illustrated in Figure 4. These include task objects to retrieve a file from the file store, transform and compare the file contents, process macros, handle storage within the database, and handling change information. These classes will now be described briefly.
File Retrieve: this class is responsible for copying the file from the targeted changed location Transform: this class is responsible for taking an XLS file and deconstructing the file into XML worksheets. The Transform class calls the Comparison class and Macro class to perform all delta operations. This class will deconstruct all named ranges and detect any changes. This class also spawns the Compare and Macro Decomposer tasks into the asynchronous framework.
Compare: this class is responsible for calculating and classifying the state at the most recent time point and a description of the changes, known as "deltas" between two time points of the XML. This class classifies all change types and cell types and tags each delta change with the associated change. This class uses the following algorithm: • Position the pointer at the first cell from the previous XML file and current file XML file.
• Compare the cell matrix positions (row position, column position) of the two cells. If the current and previous cell positions are the same, then compare the cell values and formulas to check for differences. The comparison will identify and classify the type of change. If the cell contains a formula the cell is tagged with the type of "formula" to distinguish clearly between a cell that contains a fixed numerical value and one that, when displayed, shows the same value as a result of a formula evaluation. The difference between the cell changes is also established. For an integer value the difference is between the two integers. A string difference is based on the length of each string. A date difference is calculated by the number of days. If the current cell position is greater than the previous cell position, then add a "data added" record to the delta XML file. If the previous cell position is greater than the current cell position, then add a "data deleted" record to the delta XML file. • Get the next cell from the file where the cell matrix position is the smallest and start the comparison process again by comparing the cell matrix positions.
• Repeat this process until both the previous and current XML file have been read until the end of the file. The completed process results in a delta XML file. This contains the categorised differences between the current and previous time points.
Bulk Load: this class is responsible for bulk inserting the delta file created from the Transform/Compare task.
XML Storage: this class is responsible for storing the XML file created from the transformation of the document. This class serves to compress the current generated XML file, remove the previous file and upload the new file to the datastore.
Macro Decomposer: this class calculates the differences between two programmatic scripts at two time points. The capture script may be integrated into an existing source code control system 40.
Limit Checker: this class monitors change types occurring within the datastore and provides alerts to the specified targets.
If the embodiment provides a Web-based interface through which a user can gain access to data within the database. This provides a central point from which all spreadsheet activity can be monitored. A process running on the monitoring server 16 generates queries that are applied to the database and from the results of those queries generates pages that can be served to a Web browser by a Web server 38. The range of queries and reports that can be generated is essentially without limit. These are tailored to the specific requirements of a particular user of the system. The same interface can also be used for configuration and maintenance of the system. Provision of a Web-based interface to a database is a routine matter and the details will not therefore be described here further.
Additionally, it is possible to generate alerts based on the result of the queries. For example, an administrator may define a range of prohibited actions for some parts of a specific spreadsheet, or may wish to be alerted when the value of a particular cell moves outside of a predetermined range. The system may generate and dispatch an e-mail message to a predetermined address when it detects that an alert should be issued. Several of the main screens provided by other Web-based interface will now be described briefly.
Figure 16 is a screen that allows a user to see which files are being monitored and, where files are linked, the hierarchy of linked files.
Figure 17 shows a screen that allows the user to monitor how the value of a particular cell within a spreadsheet changes over time. Value is represented graphically in this screen.
Figure 18 shows a screen that presents a user with a summary of changes that have been made to the spreadsheet or over time. The nature of each change and the person who made it are identified.
Figure 19 illustrates the source code control aspect of the invention. In Figure 19, the right-hand pane shows a segment of program code in its current version, while the left- hand pane shows the previous version of the code.
Microsoft and Excel are trade marks of Microsoft Corporation.

Claims

Claims
1. A system for monitoring and auditing data in one or more spreadsheets, the system comprising a file store within which files containing spreadsheet data can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
2. A system according to claim 1 in which changes are detected and recorded on each occasion that the spreadsheet data within the file store or changes.
3. A system according to claim 1 or claim 2 including file monitoring means operative to monitor files within the file store and to initiate an action upon detection of a change in a file in the file store.
4. A system according to any preceding claim in which the monitoring means includes comparison means operative to compare a file in the file store (the current file) with a previous version of that file.
5. A system according to claim 4 in which the comparison means compares each cell in the current file with a corresponding cell in the previous version and, if it is determined that the content of the cell has changed, creates a record of that change.
6. A system according to claim 5 which the record includes data that classifies the nature of the change.
7. A system according to claim 6 in which the nature of the change is classified as being one or more of: data added, data changed, data deleted, function added, function changed, function deleted, data to function, function to data, cell error, function recalculation, script changes, external reference changes, named range changes, link changes, password changes, sheet added, sheet deleted, sheet renamed, spreadsheet renamed, spreadsheet deleted, spreadsheet added.
8. A system according to any one of claims 5 to 7 in which the record includes a time stamp indicating the time at which the change was made.
9. A system according to any one of claims 5 to 8 in which the record includes data that identifiers the user who made the change.
10. A system according to any cone of claims 5 to 9 in which a file being monitored is parsed and converted to XML prior to processing by the comparison means.
11. A system according to any preceding claim in which the monitoring means operates asynchronously of the network clients that create the files it is operative to monitor.
12. A system for monitoring and auditing program code in one or more documents, the system comprising a file store within which files containing documents can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in code contained in the documents within the file store, and to record the changes in the database.
13. A system for monitoring and auditing program code according to claim 1 in which the documents include one or more of spreadsheets, word processor files and database files.
14. A system for monitoring and auditing program code in combination with a system for monitoring and auditing data in one or more spreadsheets according to any one of claims 1 to 12.
15. A computer program product, executable on computer hardware, to constitute monitoring means within a system according to any one of claims 1 to 12.
16. A server for connection to a network for monitoring and auditing data in one or more spreadsheets, the network including a file store within which files containing spreadsheet data can be stored, and a database, the server being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
PCT/GB2005/000529 2004-02-12 2005-02-14 Auditing and tracking changes of data and code in spreadsheets and other documents WO2005081126A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2006552693A JP5312742B2 (en) 2004-02-12 2005-02-14 Audit and track data and code changes in spreadsheets and other documents
EP05708347A EP1719060A2 (en) 2004-02-12 2005-02-14 Auditing and tracking changes of data and code in spreadsheets and other documents
US11/461,087 US9734139B2 (en) 2005-02-14 2006-07-31 Auditing and tracking changes of data and code in spreadsheets and other documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB0403059.9A GB0403059D0 (en) 2004-02-12 2004-02-12 Xigence version 1.0
GB0403059.9 2004-02-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/461,087 Continuation-In-Part US9734139B2 (en) 2005-02-14 2006-07-31 Auditing and tracking changes of data and code in spreadsheets and other documents

Publications (2)

Publication Number Publication Date
WO2005081126A2 true WO2005081126A2 (en) 2005-09-01
WO2005081126A3 WO2005081126A3 (en) 2006-06-22

Family

ID=32011746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/000529 WO2005081126A2 (en) 2004-02-12 2005-02-14 Auditing and tracking changes of data and code in spreadsheets and other documents

Country Status (4)

Country Link
EP (1) EP1719060A2 (en)
JP (1) JP5312742B2 (en)
GB (1) GB0403059D0 (en)
WO (1) WO2005081126A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007072051A2 (en) * 2005-12-23 2007-06-28 Mark Martindale Data tracking system
WO2008015395A2 (en) 2006-07-31 2008-02-07 Cluster Seven Limited Storage and processing of spreadsheets and other documents
US8656270B2 (en) 2006-08-18 2014-02-18 International Business Machines Corporation Change-oriented spreadsheet application
US8656272B2 (en) * 2006-04-28 2014-02-18 Microsoft Corporation Persisting instance-level report customizations
US9501463B2 (en) 2005-12-08 2016-11-22 Microsoft Technology Licensing, Llc Spreadsheet cell-based notifications
WO2016193679A1 (en) 2015-05-29 2016-12-08 Cluster Seven Limited Method and system for visualizing or interacting with array data using limited-resolution display devices
US9734139B2 (en) 2005-02-14 2017-08-15 Cluster Seven Limited Auditing and tracking changes of data and code in spreadsheets and other documents

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5303146A (en) 1993-03-11 1994-04-12 Borland International, Inc. System and methods for improved scenario management in an electronic spreadsheet
US5806078A (en) 1994-06-09 1998-09-08 Softool Corporation Version management system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01250142A (en) * 1988-03-30 1989-10-05 Mitsubishi Electric Corp Program editing method in electronic computer
JPH05108437A (en) * 1991-10-14 1993-04-30 Nec Corp Program file alteration history management method
JP4003237B2 (en) * 1996-05-10 2007-11-07 カシオ計算機株式会社 Form output device
JP2001109618A (en) * 1999-10-13 2001-04-20 Nec Eng Ltd History managing file system
JP2001256043A (en) * 2000-03-10 2001-09-21 Toshiba Corp Correction history managing method and correction history management system of program source
JP2003058534A (en) * 2001-08-09 2003-02-28 Casio Comput Co Ltd Data processor and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5303146A (en) 1993-03-11 1994-04-12 Borland International, Inc. System and methods for improved scenario management in an electronic spreadsheet
US5806078A (en) 1994-06-09 1998-09-08 Softool Corporation Version management system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J DAVID EISENBERG: "OASIS OpenDocument Essentials - Using OASIS OpenDocument XML", 9 December 2003, article "Text Documents - Advanced", pages: 6
PAUL GOLDWATER; LOIS S MAHONEY: "Collaborate on Spreadsheets", JOURNAL OF ACCOUNTANCY ONLINE ISSUES, January 2004 (2004-01-01)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734139B2 (en) 2005-02-14 2017-08-15 Cluster Seven Limited Auditing and tracking changes of data and code in spreadsheets and other documents
US9501463B2 (en) 2005-12-08 2016-11-22 Microsoft Technology Licensing, Llc Spreadsheet cell-based notifications
WO2007072051A2 (en) * 2005-12-23 2007-06-28 Mark Martindale Data tracking system
WO2007072051A3 (en) * 2005-12-23 2008-05-08 Mark Martindale Data tracking system
US8656272B2 (en) * 2006-04-28 2014-02-18 Microsoft Corporation Persisting instance-level report customizations
WO2008015395A2 (en) 2006-07-31 2008-02-07 Cluster Seven Limited Storage and processing of spreadsheets and other documents
WO2008015395A3 (en) * 2006-07-31 2008-10-02 Cluster Seven Ltd Storage and processing of spreadsheets and other documents
JP2009545793A (en) * 2006-07-31 2009-12-24 クラスター セブン リミテッド Storage and processing of spreadsheets and other documents
EP2251796A2 (en) 2006-07-31 2010-11-17 Cluster Seven Limited Storage and processing of spreadsheets
US8656270B2 (en) 2006-08-18 2014-02-18 International Business Machines Corporation Change-oriented spreadsheet application
WO2016193679A1 (en) 2015-05-29 2016-12-08 Cluster Seven Limited Method and system for visualizing or interacting with array data using limited-resolution display devices
US10740535B2 (en) 2015-05-29 2020-08-11 Cluster Seven Limited Method and system for visualizing or interacting with array data using limited-resolution display devices

Also Published As

Publication number Publication date
GB0403059D0 (en) 2004-03-17
EP1719060A2 (en) 2006-11-08
WO2005081126A3 (en) 2006-06-22
JP5312742B2 (en) 2013-10-09
JP2007522575A (en) 2007-08-09

Similar Documents

Publication Publication Date Title
US9734139B2 (en) Auditing and tracking changes of data and code in spreadsheets and other documents
US10558554B2 (en) Machine learning based software correction
JP5592451B2 (en) Storage and processing of spreadsheets and other documents
US8464278B2 (en) Method for performing real-time analytics using a business rules engine on real-time heterogeneous materialized data views
US7974896B2 (en) Methods, systems, and computer program products for financial analysis and data gathering
US11726774B2 (en) Application programming interface for a registry
US6199070B1 (en) Using a database for program logs
KR101083488B1 (en) Impact analysis in an object model
US6370542B1 (en) Method and apparatus for knowledge acquisition and management
US7010546B1 (en) Method and system for testing data sources and database oriented software applications
US7418453B2 (en) Updating a data warehouse schema based on changes in an observation model
US20170351989A1 (en) Providing supply chain information extracted from an order management system
EP1719060A2 (en) Auditing and tracking changes of data and code in spreadsheets and other documents
US8140479B2 (en) Logical classification of objects on a computer system
Mavlyutov et al. Dependency-Driven Analytics: A Compass for Uncharted Data Oceans.
CN113326247B (en) Cloud data migration method and device and electronic equipment
CA2696276A1 (en) Automated data object set administration
US20030014557A1 (en) System and method for transforming operating system audit data to a desired format
US20100100870A1 (en) Configuration deployment management
US20090070743A1 (en) System and method for analyzing software applications
US20040093336A1 (en) Computer program method and apparatus to recognize and normalize data pattern based information
US20220292053A1 (en) Method for generating a coherent representation for at least two log files
CN112948478A (en) Link-based code analysis method and device, electronic equipment and storage medium
US20150006239A1 (en) System, method, and apparatus for fraud detection
Ordonez-Ante et al. Dynamic data transformation for low latency querying in big data systems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 11461087

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2006552693

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWE Wipo information: entry into national phase

Ref document number: 2005708347

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005708347

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 11461087

Country of ref document: US