EP1719060A2 - Auditing and tracking changes of data and code in spreadsheets and other documents - Google Patents

Auditing and tracking changes of data and code in spreadsheets and other documents

Info

Publication number
EP1719060A2
EP1719060A2 EP05708347A EP05708347A EP1719060A2 EP 1719060 A2 EP1719060 A2 EP 1719060A2 EP 05708347 A EP05708347 A EP 05708347A EP 05708347 A EP05708347 A EP 05708347A EP 1719060 A2 EP1719060 A2 EP 1719060A2
Authority
EP
European Patent Office
Prior art keywords
data
file
changes
spreadsheet
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP05708347A
Other languages
German (de)
English (en)
French (fr)
Inventor
Andrew Reeves
James Culverwell
Aaron Wittman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cluster Seven Ltd
Original Assignee
Cluster Seven Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cluster Seven Ltd filed Critical Cluster Seven Ltd
Publication of EP1719060A2 publication Critical patent/EP1719060A2/en
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/197Version control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Definitions

  • This invention relates to auditing and tracking changes of data and code in spreadsheets and other documents.
  • a spreadsheet application is a computer program that allows a user to manipulate two- dimensional and three-dimensional arrays of data.
  • a user of a spreadsheet application is presented with a two-dimensional array of cells, each cell being capable of storing an item of numeric data, text, a software object, or a formula.
  • a three-dimensional spreadsheet has several of such arrays having a definite order. Where a cell contains a formula, the display normally shows the result of the calculation that the formula defines.
  • the formula can include amongst it input variables the value of another cell within the spreadsheet or within another spreadsheet. Of course, the value of the other cell may be the result of calculation of a formula within the other cell.
  • Such linkage between cells can be extended to an arbitrary degree of complexity.
  • Spreadsheets were conceived as tools to be used for performing quick calculations on personal data.
  • data and functionality that are critical to the operations of an organisation would traditionally have been stored in databases supporting computer applications that have been developed by programmers to perform specific, controlled actions on the data.
  • Computer spreadsheets have become ubiquitous in many organisations as a result of the growth in power of personal computers. Therefore, in practice, this traditional model of how data is managed is no longer followed in many organisations.
  • Spreadsheets that were hurriedly put together to perform a calculation that the programmed database lacked have become essential tools central to the work of their creators, and, effectively, central to the operation of the business. In the present situation, the features of spreadsheets that make them very attractive to their users become unwelcome from the point of view of the organisation as a whole.
  • a spreadsheet can be changed very easily.
  • the interconnected nature of the cells in a spreadsheet can mean that the consequences of such a change can be very difficult to foresee. This is especially so in cases where the spreadsheet is referenced by another spreadsheet, possibly developed by another person. It is also not possible to determine the content of a spreadsheet at any given moment, so that the reasons behind a conclusion being drawn from the content of a spreadsheet may not be traceable.
  • the power of modern personal computers is such as to make practicable implementation of a scripting language in spreadsheet applications to perform calculations and actions of arbitrary complexity. Yet this code is not subject to version control and change tracking familiar to programmers in traditional programming languages.
  • An aim of this invention is to provide a mechanism whereby actions performed upon a spreadsheet can be recorded in such a manner that the state of the spreadsheet at any particular time can be recovered and the evolution between states can be interrogated. It is a further aim of the invention that the inherent flexibility of the spreadsheet, which makes it so attractive, is not lost.
  • this invention provides a system for monitoring and auditing data in one or more spreadsheets, the system comprising a file store within which files containing spreadsheet data can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
  • the file store may be constituted in many different forms in different embodiments of the invention, and it may include many different components. For example, it may include one or more of a file server and a workstation computer. Components of the file store may be at diverse locations and interconnected by local-area or wide-area network links. Likewise, the monitoring means may access the file store using local-area or wide-area network links. The monitoring means most preferably includes a file location module that searches the file store for spreadsheets that are to be monitored.
  • the monitoring means includes comparison means operative to compare a file in the file store (the current file) with a previous version of that file. This enables the system to determine the differences between success and saved versions of the same file. More specifically, the comparison means may compare each cell in the current file with a corresponding cell in the previous version and, if it is determined that the content of the cell has changed, create a record of that change. Thus, changes can be recorded on a cell-by-cell basis. The record thereby created most advantageously includes data that characterises the nature of the change. Such a record can be inspected to obtain a qualitative understanding of the nature of the changes made to the spreadsheet file.
  • the nature of the change may be classified as being one or more of: data added, data changed, data deleted, function added, function changed, function deleted, data to function, function to data, cell error, function recalculation, [script changes, external reference changes, named range changes, link changes, password changes, sheet added, sheet deleted, sheet renamed, spreadsheet renamed, spreadsheet deleted, spreadsheet added. It will be understood that not all embodiments may include all of these classifications and that some embodiments may include further classifications not listed here.
  • the record may include additional data, such as a time stamp that indicates when the change was made and the identity of a user who made the change.
  • a file being monitored is parsed and converted to a more easily processable form, for example a text-based format (such as, XML).
  • XML has the further advantage that there are many tools available to process it and that it can be represented in both a text-based, serialised form and a hierarchical form.
  • this invention provides a system for monitoring and auditing program code in one or more documents, the system comprising a file store within which files containing documents can be stored, a database, and monitoring means; the monitoring means being operative to detect changes in code contained in the documents within the file store, and to record the changes in the database.
  • the database contains a history of changes made to the code such that the development and evolution of the code over a period of time can be monitored and audited.
  • Embodiments of in this aspect of the invention can work with documents of different types that contain code.
  • these may include spreadsheets, word processor documents, database files and so forth.
  • this invention provides a computer program product, executable on computer hardware, to constitute monitoring means within an embodiment of the first aspect or second aspect of the invention.
  • this invention provides a server for connection to the network for monitoring and auditing data in one or more spreadsheets, the network including a file store within which files containing spreadsheet data can be stored, and a database, the server being operative to detect changes in spreadsheet data within the file store, and to record the changes in the database.
  • Such server may constitute monitoring means in the first aspect of the invention.
  • embodiment of the invention does not interfere with the operation of the clients and that, as far as possible, the presence of the embodiment is entirely transparent to users of the clients.
  • the embodiment of the invention operate asynchronously of the clients on the network. Specifically, operation of the clients should not be blocked while the embodiment processes files saved within the file store.
  • Figure 1 is a high-level diagrammatic overview of an embodiment of the invention
  • Figure 2 illustrates a process of maintaining data in the embodiment of Figure 1;
  • Figure 3 illustrates a process of capturing files in the embodiment of Figure 1;
  • Figure 4 illustrate a component of the embodiment of Figure 1
  • Figure 5 is an overview of a network of computers within an enterprise in which the present invention is embodied
  • FIG. 6 is a block diagram that shows the principal functional components of an embodiment of the invention.
  • Figure 7 is a diagram illustrating a base class from which other task classes are derived;
  • Figure 8 is a diagram illustrating at various task classes;
  • Figure 9 is a diagram illustrating the operation of the file retrieve class
  • Figure 10 is a diagram illustrating the operation of the transform class
  • Figure 11 is a diagram illustrating the operation of the file retrieve class
  • Figure 12 is a diagram illustrating the operation of the macro decomposer class
  • Figure 13 is a diagram illustrating the operation of the compare class
  • Figure 14 is a diagram illustrating the operation of the bulk load class
  • Figure 15 is a diagram illustrating the operation of the XML storage class.
  • Figures 16 to 19 are screenshots showing various displays generated during operation of the embodiment.
  • the embodiment is a system that works in conjunction with an existing business desktop and networking environment.
  • the system identifies the existence of spreadsheet files within the network, and then captures those files selected by the business for management.
  • the system then builds a database of the content of those files.
  • the system monitors the network for events that occur in the managed file.
  • trigger events including (but not exclusively) a file save of a spreadsheet
  • the system will retrieve a copy of the file, decompose it to elemental parts and compare this with the original held in the database.
  • the system then saves changes made in the database. Where the file has been renamed, deleted or moved, it records this in the database.
  • the system allows the business to view data collected in the database centrally. This system can aggregate and analyse the data using a portal employing existing technologies.
  • the system can access multiple network domains to do its work in order to maintain the files under management.
  • the system converts the contents of the files into a representation in the Extensible Markup Language (XML) and use this format to make comparisons.
  • XML Extensible Markup Language
  • the system works at the most granular level to save files, track and identify and capture changes to files.
  • the database is populated at this very granular level.
  • the business user can apply their own reporting tools and analysis choices to the data.
  • the invention does not restrict what is done with the data.
  • the system can work in domains where access has been given. It will seek out files, events and post event versions of the files to build a database.
  • the system scans targeted network environments, and pulls out the list of all files to be monitored.
  • the invention identifies the existence of files, where new files have been created since the last stand and where files have been deleted.
  • the scan generates a report enabling either an automatic software tool or the business to trigger the capture of files for either the start of the process ( Figure 1) or the maintenance of files ( Figure 2).
  • This embodiment of the invention is implemented within a network of computers.
  • this network there is an arbitrarily large number of clients, which are personal computers upon which users run spreadsheet applications to develop and use spreadsheets. Users may also store their personal spreadsheet files upon these computers. Files can be monitored on any drive linked to the network. Usually, but not always, important spreadsheets are located on central file servers. Also on the network is a monitoring server, which is responsible for monitoring and recording activities performed by users upon the spreadsheets.
  • a network of computers within an enterprise comprises an arbitrarily large number of client computers 10.
  • Each client computer 10 is used by a user to create personal documents, including spreadsheets.
  • each client computer 10 has a single user who will have their own personal preferences and working practices associated with their spreadsheets.
  • Each client computer 10 is connected to a network 12. For the purposes of illustration, this is shown in Figure 1 as an integrated network. However, in practical embodiments, it is likely to include many subnets, and local- area and wide-area network links. Also connected to the network 12 is one are more file server 14. Depending on network policies, a user may choose to store files either on their client machine 10 or on a file server 14. Together, local storage provided by each client computer 10 and the or each file server 14 constitute a diverse file store.
  • a monitoring server 16 is connected to the network 12 such that it has access to the entire file store, or to part of the file store, according to the policy of its operator. It is within the monitoring server 16 that the processes provided by the invention are performed.
  • the configuration of the monitoring server 16 is highly dependent upon the nature of the network as a whole.
  • the monitoring server 16 must be capable of performing its task with minimal delay upon operation of software executed on the clients 10.
  • the monitoring server 16 may be a monolithic machine while in other cases it may be a cluster of machines controlled by servers 44 to enable parallel processing of multiple documents.
  • FIG. 6 the entirety of spreadsheet files within the file store is shown at 20.
  • the other items shown in Figure 6 are logical components of the monitoring server 16. These components are as follows: a watching service 22, a controller service 24 and, and meta store database 26, and an asynchronous framework 30 that includes a monitoring service 32 and a process service 34. Operation of the system as whole is controlled by passing task messages between the various logical components. This has the advantage that it enables the system to be scaled by adding further machines into a cluster and distributing task messages amongst them. It also allows the monitoring server to process the spreadsheet files 20 asynchronously of the clients, thereby ensuring a minimum of interference with the operation of their users' spreadsheet programs.
  • the function of the watching service 22 is to monitor the spreadsheet files 20 and to raise an event when any of them changes.
  • the file is read, its current contents compared with the contents of the file when it was last read, and changes which are identified are then stored in a database.
  • every time the file in the file store changes to a record is made. It is therefore possible, starting with the file as it is in the file store, to apply the recorded changes in reverse and thereby arrive at the file as it was at an earlier stage in its development. Provision of the watching service 22 obviates the need to provide hooks into the spreadsheet programs to trigger an action upon a file been changed.
  • this embodiment is concerned with processing spreadsheets created using Microsoft Excel.
  • This application stores spreadsheets in files that conventionally have a filename extension ".XLS” (conveniently referred to as “XLS files”). These are binary files.
  • XLS files are binary files.
  • the principles of the invention could equally well be applied to spreadsheets created using other applications irrespective of how these applications store spreadsheet data. Since it is notoriously difficult to perform content analysis on binary files, the approach taken by this embodiment is to first convert XLS files into a representation in the Extensible Markup Language (XML).
  • Other spreadsheet programs notably OpenOffice.org Calc use XML as their native file format. Therefore, the conversion step can be omitted when processing these files.
  • this embodiment is implemented on a cluster of computers. Operations performed by the server are carried out as discrete tasks. Task objects are created and are placed in a task queue. When a task reaches the head of the queue, it is processed by the first available machine in the cluster.
  • FIG. 3 illustrates the structure of the parent class from which all tasks are derived.
  • Four main methods are exposed by every task object. These are: Operate Task, Start Task, Run Task and End Task.
  • Operate Task is the method for the asynchronous framework to launch the object. It also makes a log entry into tracing code. It takes as an input a Transaction State Object (TSO) which contains the state variables for a task to use within its processing.
  • TSO Transaction State Object
  • the Start Task method sets up running configuration variables. Run Task performs all functions for completing the task. End Task sets all finishing operations.
  • Transform this class is responsible for taking an XLS file and deconstructing the file into XML worksheets.
  • the Transform class calls the Comparison class and Macro class to perform all delta operations. This class will deconstruct all named ranges and detect any changes. This class also spawns the Compare and Macro Decomposer tasks into the asynchronous framework.
  • Compare this class is responsible for calculating and classifying the state at the most recent time point and a description of the changes, known as "deltas" between two time points of the XML.
  • This class classifies all change types and cell types and tags each delta change with the associated change. This class uses the following algorithm: • Position the pointer at the first cell from the previous XML file and current file XML file.
  • this class is responsible for storing the XML file created from the transformation of the document. This class serves to compress the current generated XML file, remove the previous file and upload the new file to the datastore.
  • Macro Decomposer this class calculates the differences between two programmatic scripts at two time points.
  • the capture script may be integrated into an existing source code control system 40.
  • the embodiment provides a Web-based interface through which a user can gain access to data within the database.
  • This provides a central point from which all spreadsheet activity can be monitored.
  • a process running on the monitoring server 16 generates queries that are applied to the database and from the results of those queries generates pages that can be served to a Web browser by a Web server 38.
  • the range of queries and reports that can be generated is essentially without limit. These are tailored to the specific requirements of a particular user of the system.
  • the same interface can also be used for configuration and maintenance of the system. Provision of a Web-based interface to a database is a routine matter and the details will not therefore be described here further.
  • alerts based on the result of the queries. For example, an administrator may define a range of prohibited actions for some parts of a specific spreadsheet, or may wish to be alerted when the value of a particular cell moves outside of a predetermined range.
  • the system may generate and dispatch an e-mail message to a predetermined address when it detects that an alert should be issued.
  • Figure 16 is a screen that allows a user to see which files are being monitored and, where files are linked, the hierarchy of linked files.
  • Figure 17 shows a screen that allows the user to monitor how the value of a particular cell within a spreadsheet changes over time. Value is represented graphically in this screen.
  • Figure 18 shows a screen that presents a user with a summary of changes that have been made to the spreadsheet or over time. The nature of each change and the person who made it are identified.
  • Figure 19 illustrates the source code control aspect of the invention.
  • the right-hand pane shows a segment of program code in its current version, while the left- hand pane shows the previous version of the code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
EP05708347A 2004-02-12 2005-02-14 Auditing and tracking changes of data and code in spreadsheets and other documents Ceased EP1719060A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB0403059.9A GB0403059D0 (en) 2004-02-12 2004-02-12 Xigence version 1.0
PCT/GB2005/000529 WO2005081126A2 (en) 2004-02-12 2005-02-14 Auditing and tracking changes of data and code in spreadsheets and other documents

Publications (1)

Publication Number Publication Date
EP1719060A2 true EP1719060A2 (en) 2006-11-08

Family

ID=32011746

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05708347A Ceased EP1719060A2 (en) 2004-02-12 2005-02-14 Auditing and tracking changes of data and code in spreadsheets and other documents

Country Status (4)

Country Link
EP (1) EP1719060A2 (ja)
JP (1) JP5312742B2 (ja)
GB (1) GB0403059D0 (ja)
WO (1) WO2005081126A2 (ja)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734139B2 (en) 2005-02-14 2017-08-15 Cluster Seven Limited Auditing and tracking changes of data and code in spreadsheets and other documents
US9501463B2 (en) 2005-12-08 2016-11-22 Microsoft Technology Licensing, Llc Spreadsheet cell-based notifications
GB2433614A (en) * 2005-12-23 2007-06-27 Mark Martindale Data tracking system
US8656272B2 (en) * 2006-04-28 2014-02-18 Microsoft Corporation Persisting instance-level report customizations
EP2047384B1 (en) 2006-07-31 2011-03-23 Cluster Seven Limited Storage and processing of spreadsheets and other documents
CN101127034B (zh) 2006-08-18 2012-05-23 国际商业机器公司 数据组织、查询、呈现、存档、恢复、删除、提炼方法及装置和系统
GB201509331D0 (en) 2015-05-29 2015-07-15 Cluster Seven Ltd Visualization method and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01250142A (ja) * 1988-03-30 1989-10-05 Mitsubishi Electric Corp 電子計算機におけるプログラム編集方法
JPH05108437A (ja) * 1991-10-14 1993-04-30 Nec Corp プログラムフアイル変更履歴管理方法
US5303146A (en) 1993-03-11 1994-04-12 Borland International, Inc. System and methods for improved scenario management in an electronic spreadsheet
US5806078A (en) 1994-06-09 1998-09-08 Softool Corporation Version management system
JP4003237B2 (ja) * 1996-05-10 2007-11-07 カシオ計算機株式会社 帳票出力装置
JP2001109618A (ja) * 1999-10-13 2001-04-20 Nec Eng Ltd 履歴管理ファイルシステム
JP2001256043A (ja) * 2000-03-10 2001-09-21 Toshiba Corp プログラムソースの修正履歴管理方法および修正履歴管理システム
JP2003058534A (ja) * 2001-08-09 2003-02-28 Casio Comput Co Ltd データ処理装置およびプログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2005081126A2 *

Also Published As

Publication number Publication date
WO2005081126A3 (en) 2006-06-22
JP2007522575A (ja) 2007-08-09
GB0403059D0 (en) 2004-03-17
JP5312742B2 (ja) 2013-10-09
WO2005081126A2 (en) 2005-09-01

Similar Documents

Publication Publication Date Title
US9734139B2 (en) Auditing and tracking changes of data and code in spreadsheets and other documents
US10558554B2 (en) Machine learning based software correction
JP5592451B2 (ja) スプレッドシートおよびその他の文書の記憶および処理
US8464278B2 (en) Method for performing real-time analytics using a business rules engine on real-time heterogeneous materialized data views
US7974896B2 (en) Methods, systems, and computer program products for financial analysis and data gathering
US11726774B2 (en) Application programming interface for a registry
US6199070B1 (en) Using a database for program logs
KR101083488B1 (ko) 객체 모델의 영향 분석
US6370542B1 (en) Method and apparatus for knowledge acquisition and management
JP4097263B2 (ja) ウェブアプリケーションモデル生成装置、ウェブアプリケーション生成支援方法及びプログラム
US7418453B2 (en) Updating a data warehouse schema based on changes in an observation model
US20170351989A1 (en) Providing supply chain information extracted from an order management system
WO2005081126A2 (en) Auditing and tracking changes of data and code in spreadsheets and other documents
US8140479B2 (en) Logical classification of objects on a computer system
Mavlyutov et al. Dependency-Driven Analytics: A Compass for Uncharted Data Oceans.
CN113326247B (zh) 云端数据的迁移方法、装置及电子设备
US20030014557A1 (en) System and method for transforming operating system audit data to a desired format
US20090070743A1 (en) System and method for analyzing software applications
US20040093336A1 (en) Computer program method and apparatus to recognize and normalize data pattern based information
CN112948478A (zh) 基于链路的代码分析方法、装置、电子设备及存储介质
US20150006239A1 (en) System, method, and apparatus for fraud detection
US20100250621A1 (en) Financial-analysis support apparatus and financial-analysis support method
US20230099916A1 (en) Determining insights related to performance bottlenecks in a multi-tenant database system
Ordonez-Ante et al. Dynamic data transformation for low latency querying in big data systems
CN115629960A (zh) 埋点信息实时分析的方法、装置、存储介质和电子设备

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060912

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR LV MK YU

RBV Designated contracting states (corrected)

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20080418

APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

APBR Date of receipt of statement of grounds of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA3E

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

APBT Appeal procedure closed

Free format text: ORIGINAL CODE: EPIDOSNNOA9E

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20171101