AU2014202494A1

AU2014202494A1 - A system and method for categorizing time expenditure of a computing device user

Info

Publication number: AU2014202494A1
Application number: AU2014202494A
Authority: AU
Inventors: Thomas Haines
Original assignee: Wisetime Pty Ltd
Current assignee: Wisetime Pty Ltd
Priority date: 2013-05-08
Filing date: 2014-05-08
Publication date: 2014-11-27
Also published as: AU2014202495B2; AU2014202495A1

Abstract

Abstract The present invention relates to the analysis and categorization of time expenditure of users of computing devices having a graphical user interface. A computer implemented method for categorizing time expenditure of a computing device user is provided, comprising detecting an item of content has been added to one or more digital content repositories; selecting a content identifier from the item of content; determining whether text of the content identifier contains an analysis identifier, the analysis identifier having an identification string, and whereupon an analysis identifier is determined to be absent from the content data of the content identifier building an analysis identifier, and altering the content data of the content identifier to include the built analysis identifier; and receiving a plurality of user activity data records, each user activity data record associated with a user identifier, and each user activity data record containing a time indicator and the active window title of the computing device at or during that time indicator.

Description

"A system and method for categorizing time expenditure of a computing device user" Technical Field [01] The present invention is related to the field of categorizing time expenditure of a user. More particularly, the present invention is related to the field of analyses and categorization of time expenditure of the users of computing devices having a graphical user interface. Summary of Invention [02] According to a first aspect of the present invention, a computer implemented method for categorizing time expenditure of a computing device user is provided, comprising: detecting an item of content has been added to one or more digital content repositories; selecting a content identifier from the item of content; determining whether text of the content identifier contains an analysis identifier, the analysis identifier having an identification string, and whereupon an analysis identifier is determined to be absent from the content data of the content identifier building an analysis identifier, and altering the content data of the content identifier to include the built analysis identifier; and receiving a plurality of user activity data records, each user activity data record associated with a user identifier, and each user activity data record containing a time indicator and the active window title of the computing device at or during that time indicator. By altering the content identifier to include an analysis identifier, the accuracy by which user data records received can be categorized or otherwise analyzed is improved. [03] Preferably, selection of the content identifier is based at least in part on the file type of the item of content. By basing the selection of the content identifier at least in part on the file type, the likelihood that the analysis identifier will appear in the window title is improved. [04] The operation of detecting an item of content has been added to one or more digital content repositories may further comprises detecting an item of content has been requested from the one or more digital content repositories. 1 [05] The content identifier may be selected from the group consisting of: the filename of the item of content, and data located within the item of content located according to the syntax of the file type of the item of content, and at least one data type to locate being predetermined. [06] Preferably, the content data of the content identifier is altered by inserting the analysis identifier at the start, of near the start of, the content data of the content identifier. [07] By inserting the data near the start of the content identifier, there is less probability of the analysis identifier being in a truncated portion when displayed in the window title. [08] Preferably, the analysis identifier further includes an entity identifier. [09] The entity identifier reduces the probably of cross-domain conflicts, and improves the accuracy of the content identifier. [10] The analysis identifier preferably includes a version identifier, which is used to enable multiple analysis identifier formats to be adopted concurrently. [11] A checksum portion may preferably be used in the analysis identifier. This also reduces the probability of cross-domain conflicts, and improves the accuracy of the content identifier. [12] Preferably, the method further includes the operation of presenting a timeline of user activity data records for a given time period and user identifier, via a graphical timeline user interface. [13] Preferably, each user activity data record further comprises zero or more categorization identifiers. [14] Preferably, the graphical timeline user interface is adapted to allow the user to associate a user activity data record with one or more matters, aided by the information contained in the temporally adjacent time entries. [15] Preferably, the method further comprising the operation of removing an analysis identifier from the content data from the content identifier of an item of content prior to the item of content being transmitted to another entity via a public network. [16] Preferably, whereupon an item of content is detected as having been added to a digital content repository, and the content data of the content identifier having been determined to contain an analysis identifier, the method further comprises sending identifying particulars of the item of content to an analysis identifier repository manager. [17] The operation of building an analysis identifier may comprise receiving an identification string, or alternatively, the operation of building an analysis identifier comprises generating a probabilistically unique identifier. 2 [18] According to another aspect of the present invention, a system for categorizing time expenditure of a computing device user is provided, comprising a content detection module, to detect an item of content has been added to one or more digital content repositories; a content identifier selection module, to select a content identifier from the item of content; an analysis identifier determiner, to determine whether content data of the content identifier contains an analysis identifier, the analysis identifier having an identification string; an analysis identifier builder to build an analysis identifier for the item of content; a content identifier alteration module, to alter the content data of the content identifier to include the analysis identifier built by the analysis identifier builder; a user activity data processing module adapted to receive a plurality of user activity data records, each user activity data record associated with a user identifier, and each user activity data record containing a time indicator and the active window title of the computing device at or during that time indicator; and whereupon the analysis identifier determiner having determined that an analysis identifier is absent from the content data of the content identifier, the analysis identifier builder creating an analysis identifier, and the content identifier alteration module thereafter altering the content data of the content identifier to include that analysis identifier. By altering the content identifier to include an analysis identifier, the accuracy by which user data records received can be categorized or otherwise analyzed is improved. [19] Preferably, the content identifier selected being based at least in part on the file type of the item of content. [20] By basing the selection of the content identifier at least in part on the file type, the likelihood that the analysis identifier will appear in the window title is improved. [21] Preferably, the content detection module detects when an item of content has been requested from one or more digital content repositories. [22] By checking for requests, the insertion of the analysis identifier may be performed on-demand, such as in the case of a HTML proxy. [23] The content identifier may be selected from the group consisting of: the filename of the item of content, and identification data contained within the item of content located according to the syntax of the content type. [24] Preferably, the content identifier alteration module inserts the analysis identifier at the start, of near the start of, the text of the content identifier. 3 [25] By inserting the data near the start of the content identifier, there is less probability of the analysis identifier being in a truncated portion when displayed in the window title. [26] Preferably, the analysis identifier includes a checksum portion; and the system further comprising a checksum calculator to generate checksum value based on the identification string, and wherein the checksum calculator is operatively coupled with the analysis identifier determination to assess the validity of a checksum value; and operatively coupled to the content alteration module for building the analysis identifier. [27] This also reduces the probability of cross-domain conflicts, and improves the accuracy of the content identifier. The checksum value includes output from a cryptographic hash function of the identification string and a security string. [28] Preferably, the user activity data store is adapted to associate each time entry with zero or more categorization identifiers. [29] Preferably, the system further comprises a timeline presentation module to present a timeline of user activity data records for a given time period and user identifier, via a graphical timeline user interface. [30] Preferably, the timeline presentation module provides for the user to associate a user activity data record with one or more categorization identifiers. [31] Preferably, the system further comprises an active window monitor module, to monitor the active window title of a computing device, by registering to receive user input events of the computing device using a windowing system, and obtaining the window title of the foremost window of the computing device when user input events are received; and whereby the active window monitor module determines whether the window title contains an analysis identifier, and whereupon an analysis identifier is absent, finding an item of content that correlates to the active window. [32] Preferably, the analysis identifier builder requests the identification string from an analysis identifier repository manager. The analysis identifier repository manager maintains an analysis identifier data store comprising a list of issued identification strings, and associated with each said issued identification string a list of one or more content repository identifier records, each content repository identifier record containing a repository identifier and a content identifier. [33] Preferably, whereupon an item of content is detected as having been added to a digital content repository, and the text of the content identifier having been determined to contain an analysis identifier, the content detection module sends the identifying particulars of the item of content and the identification string to the analysis identifier repository manager. 4 Brief Description of Drawings [34] These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which: [35] Fig. 1 is a flow chart of the algorithmic operations of the content selector module according to one embodiment. [36] Figure 2a illustrates a method of using a server-side email trigger for the content detection module to detect an item of content has been added to digital content repository in the form of an email server, according to one embodiment. [37] Figure 2b illustrates a method of using an IMAP-client processor as an alternative method for the content detection module to detect an item of content has been added to digital content repository in the form of an email server. [38] Figure 2c illustrates the process followed by a content detection module upon receiving an email message event, according to one embodiment. [39] Fig. 3 illustrates registering to register as a listener of document events with a document management server, and processing subsequent document events from the document management server, according to one embodiment. [40] Fig. 4 is a flow chart of algorithmic operations of an active window monitor module and including user activity data store/record synchronization to an upstream server according to an embodiment. [41] Fig. 5a is a block diagram illustrating one embodiment of a system for data relating to time expenditure of a user. [42] Fig. 5b illustrates use of multiple computing devices by a user, as an expansion of the illustration in Fig. 5a. [43] Figs. 6a-6c illustrate sample analysis identifier patterns. [44] Fig. 7a is an illustration of the elements of an analysis identifier according to one embodiment, including example content data of a content identifier. [45] Fig. 7b is a flow chart of algorithmic operations to calculate a checksum portion according to one embodiment. 5 [46] Fig. 8 is an entity-relationship diagram of an analysis identifier data store SQL database according to one embodiment. [47] Fig. 9 is a flow chart of algorithmic operations of an active window monitor module according to an embodiment. [48] Fig. 10 is a flow chart of algorithmic operations performed by an analysis identifier determiner according to one embodiment. [49] Fig. 11 is a flow chart of algorithmic operations of a content identifier alteration module according to one embodiment. [50] Fig. 12a-12c is an interface provided by a timeline presentation module according to one embodiment. The 'user drag GUI actions' demonstrate the timeline presentation module as 3 still frames. [51] Fig. 13 is a flow chart of algorithmic operations performed by a user activity data processing module and timeline presentation module according to one embodiment. [52] Fig. 14 is a flow chart of algorithmic operations of an inactivity notification module according to one embodiment. [53] Fig 15 illustrates one method of locating an RFID tag user device by basing the calculation on the SSRI of one or more fixed long-range RFID readers, using a preferred triangulation-based calculation as per one embodiment. [54] Fig 16a represents the 'Cocoa' class needed to capture user input events on Mac OS X. [55] Fig 16b is a sample method that can be used to capture the active application name and window title, using 'AppleScript' on Mac OS X. [56] Fig. 17 is an illustration of an item of content in the form of an email message. [57] Fig. 18 illustrates a set of sample user activity data records. [58] Fig. 19a-c illustrates example identifier records stored in a Analysis Identifier Data Store, as described in the ER diagram appearing in Fig. 8. Description of Embodiments [59] Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive. The specification makes reference to example categories of users (preferably knowledge workers) such as accountants and attorneys. These 6 examples are for illustrative purposes only, and should not be considered as restricting the scope of application of the invention as limited to any given type or genus of worker. The invention may preferably have utility in any organisation that employs users that engaged to complete various tasks, preferably using computing devices. [60] The term 'regular expression' as used herein, is with reference to the computing term, namely a specific pattern that provides a concise and flexible mechanism to 'match' (specify and recognize) strings of text, such as particular sequences of characters, words, or patterns of characters. It is also referred to as a 'regex' statement in computing nomenclature. [61] Reference to terms such as 'matter', 'case' and 'project' can refer to any categorizable segment of labor expenditure or any definable logical grouping of labor expenditure, including but not limited to ongoing tasks such as training and personal development, discrete tasks, sub-groups of an organization, legal disputes, reports or set deliverables. The description of the embodiment(s) may refer to a categorization identifier in the form of a matter identifier. [62] The described aspects of the invention may also be implemented as a computer-controlled apparatus, a computer process, a computing system, an apparatus, or as an article of manufacture such as a computer program product or computer-readable medium. The computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and/or encoding a computer program of instructions for executing a computer process. [63] In the present embodiment, there are detailed descriptions of implementation details with reference to specific operating systems. The skilled reader will appreciate that any methods or functions defined with reference to a specific operating system may have equivalent or analogous operations available in other operating systems. While the equivalent or analogous operations may differ in name and in the underlying implementation, alternatives may be used to achieve similar effect to that described. [64] The present embodiment describes a sequence of modules, calculators, managers and data stores. These elements can be run across a plurality of computing devices 14 that are connected via one or more computer networks 24. They may be implemented using preconfigured digital circuits or circuitry, or using precompiled or pre-sequenced software that is executed on digital circuitry. The elements could be grouped into a single computing device 14, or a single application running on the computing device 14. The description of the system and method with reference to logical entities is to aide the reader's understanding of the system and method, and the description and figures is to illustrate and inform, and the segmentation of responsibility of the elements, or general arrangement of responsibilities of the elements, may be varied according to the specific technical considerations of each implementation. 7 [65] Organizations typically store their information in one or more digital content repositories 10. There are numerous digital content repositories 10 in use by industry, such as, for example, file servers, document management systems (often referred to also as a content management system or variations thereof),database servers, distributed file systems and email servers. The term digital content repository 10 may include any system or apparatus that enable items of content to be stored and retrieved, and preferably each item of content has an some form of content identifier (for example, this could be the combination of a volume identifier, a path identifier and a file name identifier/file or a database primary key) or a probabilistically unique key (e.g. an email ID) or some combination of identifiers or combinations thereof, such that the document can be viewed or retrieved by using that identifier. Two common forms of basic content identifiers are: (i) the combination of a volume identifier, and a file path/name identifier; and (ii) a table identifier and a table primary key value, usually in the form of an incrementing sequential integer. [66] A data collection system 8 and data collection method at 1000 for collecting data relating to time expenditure of a user is herein described. An overview of the definitive data collection system 8 is illustrated in Fig. 1. Preferably, the user is a knowledge worker. [67] In Fig. 1, the use of the N letter enclosed in a circle is used to denote that each computer device 14 may be connected to N digital content repositories 10, that is, one or more digital content repositories 10. Preferably, the computer device 14 may be connected to a digital content repository 10. Examples of digital content repositories 10 include a digital content repository 10 stored on the local storage medium of the computer device 14, a digital content repository 10 that may be accessible by the computing device 14 from time to time via a communications network, or an ephemeral digital content repository 10 stored in the memory of the computer device 14. [68] The data collection system 8 includes a content detection module 12 to detect an item of content has been added to one or more digital content repositories 10. [69] There are several implementations to detect an item of content has been added to a digital content repository described herein. The methods and systems to detect item of content has been added to a digital content repository may vary to account for the specific implementation details of the digital content repository 10 being monitored. [70] It will be appreciated that there are numerous digital content repositories 10 in use, including but not limited to Apple File Protocol (AFP), Nuxeo T M , Postfix Email Server, Outlook T M Email Server, Google

T

' Email Server, SharePoint T M , ZFS, EXT4, FAT32, FileNet T M , Alfresco T M and Documentum T M . The content of the digital content repositories may be stored and accessed from a local storage medium, or may be stored remotely and accessed via a network. There are many examples of local storage technologies using a tree of nodes/directories and filenames, such as FAT32, HFS+. There are also hybrid systems such as network attached storage devices. In all of these cases, there is a system that saves or retrieves a given item of content includes at least one content identifier, such as the file/path or a universal resource locator (URL). 8 [71] A content detection module 12 may run on one or more computing devices 14. [72] Following is a description of different approaches that may preferably be adopted when a content detection module 12 is monitoring a digital content repository in the form a local file system, a postfix email server, and a NuxeoT M content repository <www.nuxeo.com>. [73] A preferred embodiment of the content detection module 12 operates to detect an item of content having the form of an email message. An email message is added to (i.e. received by) a digital content repository 10 in the form of an email server 20. Preferably, the content detection module 12 may use one or both of the following techniques: a server-side email trigger 16, and/or an email-client process 18. Preferably, the email-client process 18 uses the IMAP protocol to communicate with the email server 20. [74] The server-side email trigger 16 preferably offers the ability to detect a new email message has arrived before users have been notified of the arrival of that new message. Within the context of the present invention, the server-side email trigger 16 is the generally preferred approach where such access is possible and commercially feasible, in part because it may, in some circumstances, provide a marginal improvement in the accuracy of the raw data collected by the system, as the email message can be assured to have an analysis identifier 80 embedded within its preferred content identifier prior to the email message being handled by the user (e.g. read the message or its attachments, or respond thereto). [75] There are servers that support the trigger mechanisms within their frameworks to support the implementation of the server-side email trigger 16. For example, the postfix application of MS Exchange can activate the trigger in regard to all email deliveries, or all emails meeting some predefined criteria. [76] The architectural decision of which type of technique is used by the content detection to detect that an email has been added to the email server 20 will depend on whether the system 8 is permitted to alter the operation of the email server. Whilst the server-side email trigger 16 does offer some advantages, the IMAP-client processor provides broad compatibility with most systems as it mimics standard email client behaviour. [77] The server-side email trigger 16 may be preferable over the IMAP-client processor 18 in cases where it is impractical or undesirable for the individual security credentials of users (e.g. username / password) to be known by the content detection module 12 to facilitate access to the IMAP server. [78] The server-side email trigger 16 is described with reference to the postfix email server 20 <www.postfix.org>. With reference to Fig. 2a, the content detection module 10 registers itself as an event listener with the email server 20, at 1002. [79] The registration step at 1002 should preferably include a request that the content detection module 10 be notified of any email message that has been queued for sending via the email server 20 (outbound), and of any email having being received for processing by the email server 20 (inbound). In some 9 configurations, the inbound and outbound email server 20 responsibilities may be divided between different servers or processes, and in this case, the registration step at 1002 should register with all respective processes that assume inbound and outbound email delivery responsibility. [80] On the postfix email server, a before-queue content filter can be utilized to implement the server side email trigger 16 technique. The content detection module 10 registers to use the before-queue content filter to perform the function of inspecting all new mail messages prior the messages progressing to the Postfix mail delivery queue. The configuration file of postfix should be modified to include the necessary information to identify the filter provided by the content detection module 10. As a further implementation example, when expressed in the language used by the MS Outlook Email Server, an outgoing email can be intercepted using an 'event sink' <http://support.microsoft.com/kb/317680>. [81] The word 'intercept' in the context of the email can be understood to mean that the email message is detected prior to it being sent to a user (outbound) or received by a user (inbound). [82] The process used to implement an IMAP-client processor 18 technique is illustrated in Fig. 2b. Firstly, the content detection module 10 connects to email server 20 at 1004 via a computer network 24. Upon the TCP handshake having been completed, a user's security credentials, in the form of a username and password, are submitted at 1006 in accordance with the syntax of the IMAP protocol. [83] Once authentication has occurred, the content detection module 10 requests recent email messages from email server 20 at 1008. [84] By way of background for the reader, the IMAP protocol provides server-initiated mailbox status updates. The IMAP protocol assumes that the IMAP client wants to know when new mail message are received, or when another IMAP client changes the flags of a message or removes a message from the mailbox. There are two primary commands used to receive new messages ('SEARCH NEW' or 'SEARCH RECENT'). Under the IMAP protocol, anytime that a client performs an operation on a mailbox, the server can additionally append to its response to the client additional information for example, "you now have N messages" or "message M now has such-and-such flags set". The IMAP protocol is referred to as a "line oriented" protocol. The conversation between the IMAP client and IMAP server is transmitted in the form of character strings that end with CRLF. That is, a command is sent as a line of text to the server, and the server returns its response as a line of text. In order to monitor and receive new messages, a 'SEARCH NEW' or 'SEARCH RECENT' command can be sent to IMAP server periodically to check if any new messages are available, and then use 'FETCH' command to get the message details including the message UID. [85] After checking for any new email messages at 1008, the content detection module 10 preferably sleeps for a predetermined period of, for example, 60 seconds at 1010, and checks that the connection remains alive at 1012. 10 [86] If the connection to the email server 20 remains alive, the process returns to the sleep state at 1010. If the connection to the email server 20 has dropped, then the connection process step at 1004 is initiated again. [87] The content detection module 10 preferably keeps a local store of each identifier of each email messages that it has previously detected, in a Content Detected Record Store 26. This can be used to improve the performance of the content detection module 10, by reducing the number of queries that need to be made of the email server 20. [88] An email message event occurs at the email server 20 at 1014. An email message event in this context preferably serves to notify the content detection module 10 that an email message has been received at the email server 20 or that a message has been added to another IMAP folder such as the the sent items folder of the user's mailbox. The email message associated with the email message event is delivered to the content detection module 10 via an inter-process communication and subsequently via the computer network 24. [89] Upon the email message event being received by the content detection module 12, preferably the elements of the email message are parsed at 1016. [90] By way of background for the reader, an email message contains an email message header and an email message body. The email message header contains a sequence of header fields. These fields should be parsed using the syntax defined in RFC 822 into an array of header values. In the message body of an email message, in addition to the text (or html) portion of the email message displayed to the reader, there can additionally be items of content embedded within the email message body. For example, an email may contain a PDF file, and an MS Word word processing file. [91] The process determines if the message body contains any items of content in the form of file attachments at 1018. This determination may exclude certain attachments or elements, for example, if an image file is directly referenced in the html portion of the email message, the system may choose to discard it, as it is probably the given attachment is an embedded image, and will not be treated by the user as an item of content that is opened or modified. [92] If file attachments are found at 1018, the process extracts the attachments from the email message at 1020. These file attachments will be treated as separate items of content by the system to the extend of modifying the content identifier as required, however, how this modification is in turn saved back into one or digital content repositories 10 requires special logic relating to items of content that are embedded within an email message. This is because the item is embedded within the logical grouping of an item of content. The same special logic handling encapsulated files located within compressed files such as a zip file is of a consideration of the same nature but with different implementation specifics. 11 [93] The process at 1022 extracts the 'messageld' associated with the email message. The messageld is contained in the header portion of the email message, as per RFC 822 (Standard for ARPA Internet Text Messages). An email 'messageld' serves as a probabilistically unique identifier, the string can be expressed by the regex statement "A(\S+)@(\S+)$", whereby the first grouping is a randomly or sequentially generated string of characters, and the second grouping being the name of the host that formed the email message. [94] The message of the message body and/or contents of the attachments may be encrypted. As the standard email headers are not encrypted, in the case of an encrypted email message, the system can still proceed with further processing, as outline further herein. [95] The item of content in the form of the email message, and all associated meta data concerning the email message that concerns the location or identity of that email message, such as the server is was retrieved from, the one or more users that received the email message, is passed to the analysis identifier repository manager 28 (discussed later herein) to preferably assist in subsequently identifying the item of content if subsequent time analysis is required by the system or user. [96] The email message event may also include contextual information, such as the location of the email message within the directory-like IMAP folder structure of the user's mailbox. For some email servers, this notion of folders is an artificial construct that is provided when responding to requests made via the IMAP protocol. For example, the Google Email server allows a single email message to be associated with multiple 'folders', due to the their label approach to sorting email messages (this notion of a label is also referred to as a tag on some services). That is, an email message can belong to more than one label (and by analogy in the IMAP protocol more than one folder). [97] A record of the messageld at 1024 that has been detected is stored in a content detected record store 26. The content detection module for detecting an item of email content completes at 1026. [98] If the process has not been run previously for the given user, or the connection was terminated, the process should preferably iterate over messages that have not been detected previously by referencing each item of content against the content detected record store 26. [99] The content detection module 12 can preferably detect an item of content in the form of an item of content added to a content management platform. [100] The Nuxeo T M content management platform is adopted in the present embodiment as a digital content repository 10, to detect that an item of content has been added thereto. The content detection module 12 extends the Nuxeo T M content platform, by installing a module into the platform to register to receive events j/NXDOC/Evntstand±usteners). The use of extension points is the method recommended by Nuxeo T M with respect receiving system events. 12 [101] Nuxeo uses the WebDAV as an interface method, and hence can be mounted as a shared drive on Windows, MacOS X and Linux operating systems, as would be understood by a person skilled in the art. There are numerous other examples of file servers and configurations that could be adopted to facilitate this aspect of the invention. For a user that does not have a need to share data in a collaborative environment, a local machine hard drive storage could be used as the primary digital content repository 10. [102] The Nuxeo Platform follows the event and event-listeners object-orientated programming model, whereby application code can register to receive specific events concerning the given Nuxeo document repository. [103] Preferably, the module registers to receive the following events from the document management server at 1028, from the 'EventServiceComponent' as an asynchronous event processor, and preferably and specifically, for document creation events and document modification events. When an item of content in the form of a document is created or modified, the content detection module 12 is notified of the event. The event is dispatched by the Nuxeo event service of the Nuxeo server 30. [104] When the content detection module 12 receives the event at 1030, it preferably extracts meta data connected with the document, including the document object that the event concerns, the logical path of that document (i.e. the location), a unique document identifier (a primary key), and the file name of the document. The identifying particulars of the document can be obtained via the objects contained in the event referenced in the event notification object. [105] In some instances, the administrator of the system 8 may preferably define directories or paths that should not be processed. For example, if the computer device 14 is configured to store all of its application preferences in the document management server 30, that user space may be excluded. Based on any exclusions where an exclusion list is specified (or specific inclusions in the case of an inclusion list), a decision of whether to process the document is made at 1032. Thereafter, the event is discarded at 1034 if excluded, or the process otherwise continues. [106] From a best practice standpoint particularly relevant to the legal industry, it is advisable to use read-only document templates for precedent documents, so that the knowledge worker must create a new document and save that new document to a suitable location whenever they are using a precedent document to generate a new legal document on a given matter. [107] The content detection module 12 preferably includes a third mechanism to detect an item of content has been added to a digital content repository 12 that is the local hard drive of the computing device 14, and/or of a networked file server that is accessible to applications via the computing device file system 14. For example, in the Microsoft WindowsTM operating system, a networked file system can be mounted to a drive letter (e.g. T:\), and a local hard drive can be mounted to another drive letter (C:\). 13 [108] The embodiment preferably includes an active window monitor module 32 (also described as taking the form of a 'window title monitor module'). The active window monitor module 32 is responsible for querying, polling or otherwise ascertaining the active activity (i.e. the 'foremost') that the user is engaged with on a given computing device 14. This usually takes the form of a 'window', and each window preferably contains information about the application that created the window, and information about the contents of the window. This could take the form of a visual textual string presented to the user, or it could be hidden from the user, but available programmatically, or via alternative means such as audible information that may be played to the user. [109] A set of preferred steps of the active window monitor module 32 are illustrated in Fig. 4. The 'event receiver thread' process starts at 1440. The active window monitor module 32 receives a plurality of user input events at 1442. This is explained further below with reference to Fig. 9. As each user input event is received, it is transformed into user activity record by adding the username and removing certain particulars (described in another portion of specification), and is preferably saved to a local storage location, such as the memory or disk storage of the computing device 34 at 1444. In some limited cases, such as where the user activity data store 58 is hosted locally on the computing device 34, it may be preferable to save each user input events to the user activity data store 58 as it is received. [110] A second thread of the active window monitor module 32 is illustrated in Fig. 4 in the bottom rectangular box labelled 'Uploader Thread'. The Uploader Thread runs periodically, starting at 1446. It checks the local storage location where the user activity record are stored if there are any user activity records available that are pending (in the sense they have not been uploaded to the user activity data store 58) at 1448. For any records that are found, they are compressed into a bundle of records, and uploaded to the user activity data store 58 at 1450. If the upload was successful at 1452, the user activity data record(s) are marked as complete at 1454, and the process completes at 1456. If the upload was not successful at 1452, the process completes at 1456. In the case that upload was not successful, the records will try to be uploaded again when the process runs again. [111] With reference to Fig. 9, active window monitor module 32 preferably registers to receive user input events from the host operating system of the worker's computing device 14 at 1038. It may also start an inactivity timer thread. An example method of registering to receive user input events is shown in Fig. 16a. [112] The active window monitor module 32 generates user activity data records; and preferably may detect if an item of content has been added to a local or mounted file system operatively coupled to a computing device 14. The generation of user activity data records, that is rows of data relating to user activity, as provided by the active window monitor module 32 is described in further detailed later in the description with reference to capturing user activity data records (a record may also be described using the term row). 14 [113] With reference to Fig. 4, upon the active window monitor module 32 receiving user input events 36, the application owner of the foremost window (also called the active window) and window title of the active window are obtained from the computing device operating system 34 at 1040. A user input event 36 may preferably include the mouse being moved or clicked, keys on a keyboard being pressed, a touch-screen or track-pad being touched by the user. This combination may also be referred to as the 'active application name and window title' (AANWT). [114] The notion of window 'ownership' or 'owner' is a term used to describe the application that created and/or is responsible for instructing the operating systems what should be rendered in that window (i.e. the active application). The mechanics of how to obtain these particulars are described later herein with reference to using the AppleScript engine of the OS X operating system. [115] In addition to the content detection module 12 detecting an item of content has been added to one or more digital content repositories 10, the content detection module 12 of the present embodiment may preferably provide another mechanism, to detect when an item of content has been 'requested' from one or more digital content repositories 10 in the form of a proxy service. This is most applicable for data that is ephemeral in nature, such as web page content. One example of how this is achieved is when a web application or web proxy receives a request from the computing device 14 of a knowledge worker. The knowledge worker is authenticated by way of an authenticated session, in the usual manner. [116] In implementing the proxy server, the content detection module 12 may preferably be embedded into a web application directly or via a web proxy, and can detect the request for an item of content from the knowledge worker's computing device 14 in the form of an HTTP request. Subsequently, the content identifier selection module 44, (described further below), selects the title element of the html content as the content identifier from this type of item of content. [117] Thus in the context of user web browser usage, it is possible for the content detection module 12 to take the form of a web proxy or a web-based application, which upon serving each response, includes an analysis identifier in the HTML title tag (the content identifier) text. This preferably enables otherwise transitory analytics to be persisted for subsequent analysis as required. [118] A content identifier selection module 44 is provided to select a content identifier 66 from the item of content 68. An item of content 68 may include a document, a file, an email, a set of geographic co ordinates, a collection of data, an image file, a sound file, or any digital representation of information. [119] In one embodiment, the file type is preferably used to determine which content identifier 66 to select from the item of content 68. Each item of content 68, such as a data file, in the usual case, has a file type associated with it. There are numerous file types on any system, and examples include a word processing file, an email file, an image file, a CAD file, a sound file, a HTML file, or a source code file. 15 [120] By convention, the file type is usually determined by a suffix appearing at the end of the file name after the last period '.' character. The file type is used to determine how to find and interpret information that is contained with the data file. [121] To elaborate by example, and with reference to Fig. 17, an item of content 68 with a filename 70 of "4298B.AF2663.eml" is shown. This can be broken into three logical portions: a) descriptive portion 72 of filename 70, b) last period separator 74 of the filename 70 and c) extension portion 76 of the filename 70. The item of content 68 may also have other meta-data 78 associated with it. [122] Fig. 1 illustrates the operations performed by an embodiment of the content selector module 44. The content selector module 44 at 1060 examines the file name 70 of the item of content 68. To determine if a file extension is defined, a filename regex statement of "(.*)(\.(.*))?" can preferably be applied to the filename 70. The second group of the regex will contain the extension portion of the filename 76. In the case that the filename regex statement matches, the file extension is available at 1062. Conversely, if no match, then the there is no match at 1062. According to one embodiment, the content selector module 44 may, in the case that the file extension is not in the filename 70, the binary content of the item of content 68 should preferably be examined at 1064, in order to identify the file type. A third party system may be utilised if examining the binary content of the item of content 68 is desired. [123] The file type extension is extracted from the filename regex statement at 1066 (or alternatively, the result of examine the binary content). Using the example of "4298B.AF2663.eml", the file type is an eml email message file type. [124] The system preferably selects the content identifier 66 according to the file type of the item of content 68. To elaborate on this aspect, in any given file, there may be multiple content identifiers. Using the example of an MS Word document, the following content identifiers may be present: the filename; the date the data file was created; the user that has ownership of the file; and within the data file itself: a document heading, and the author of the document. [125] For each file type, the content identifier to obtain is determined according to which content identifier 66 that is used by convention as the "document title", by the applications that are used to read or modify that file type. To explain further, the "document title" is the human-readable identifier associated with that file type, according to the prevailing standards of the applications that access the file type. [126] In most cases, the content identifier 66 that represents a "document title" is the filename 70. For an email file type (common examples .eml or .msg file), the subject line contained in the header portion of the data file is used as the document title by applications such as Microsoft Outlook, Apple Mail and Google Gmail. 16 [127] The reason that the "document title" is the preferred content identifier 66 to obtain relates to its use in the window title of applications such as word processors, web browsers and email clients (the reason that the window title is important is further explained later in the specification). [128] By way of further explanation of the reasoning of selecting different content identifiers based on file type, reference is made to the OS X Human Interface Guidelines, in the section titled 'UI Element Guidelines: Windows: The Window Title'. This guideline states that 'the title of a document window should be the name of the document that it displays', and additionally, not to 'display pathnames in window titles'. The Microsoft Windows developer documentation also asserts the same principle in 'Title Bars: UI Text Guidelines', for 'Document window' text to use a 'Document title'. [129] In most applications, the 'name of the document' (also called a document title), is hence the document filename. A notable exception is with respect to email applications, where the underlying file name is often inaccessible and/or not displayed to the user, and hence the 'document title' in email applications is invariably the subject line of the message. This is because the defining characteristic of an email is the message-ID, as opposed to the more typical use of a filename. [130] According to one embodiment, whether the file type is an email message file type is determined at 1068. If file type is not an email message, then the filename 70 is used as the selected content identifier 66. According to one embodiment, the text of the content identifier 66 is the first group of the filename regex statement described above. In this case by example, if the filename 70 is "Ltr To Client.docx", the text of the content identifier 66 is "Ltr To Client". [131] In the case of an email, the content identifier 66 selected is the subject of the email. The subject line is contained within the data file, located according to the syntax of the email file at 1072. [132] The syntax of an EML email file is defined, for example, in RFC 822, Standard for ARPA Internet Text Messages. It defines a syntax whereby the subject appears within the email header. In the case that the 'Subject' field does not appear in the headers section of the message, the content identifier can be considered as a zero-length string. A example email header portion follows: Date: 26 Aug 98 1430 EDT From: George Jones <Group @Host> To: Sam. Irving@Other-Host Subject: Re: Pressure valve testing Message-ID: <some.string@SHOST> [133] In this example, the content identifier obtained is "Re: Pressure valve testing". In illustrated example Fig. 17, the subject is "Re: Expert Evidence". The subject line is extracted by parsing each line not 17 starting with whitespace within the header portion of the email, looking for the "Subject:" field definition at 1074. The text following this field heading until the next field heading is the email subject 66. Hence, the text of the email subject is the text of the content identifier 66, for an EML email message according to this embodiment. According to one embodiment, the selection of the subject line as the selected content identifier 66 and the text of the selected content identifier is (i.e. the subject text) is sent to an analysis identifier determiner 46 at 1076. [134] The analysis identifier determiner 46 preferably determines whether the text of the content identifier 66 contains an analysis identifier 80. The analysis identifier 80 is an identifier that can be as a unique or quasi-unique identifier for the given activity, in the form of a piece of context. The 'meaning' of what that item of content is or is not may not be known at the time that the analysis identifier 80 is generated or associated with a piece of content. For example, in an alternative embodiment where local files are monitored, and an analysis identifier 80 is attached to local files, an analysis identifier 80 may be associated with the creation date timestamp or some other quasi-unique property of the content. As the item of content is moved, saved, tagged or other such forms of such contextual information become available, the analysis identifier 80 provides a determinative identifier, preferably as an item of content moves between different content repositories, such as a file server, an email server, and a document management server. [135] To elaborate on the term 'contextual information', this may encompass information that gives context or supplementary information to an item of content, and is sometimes described as meta-data, or 'data about data'. An example of contextual information is an item of content being saved by a user into a 'financial' section of an organisation's document management server. On taking this action, contextual information can be associated with the analysis identifier - viz. that the given item of context relates to financial matters. As another example, if the item of content was saved to a legal pocketing system, that adds the contextual information of what legal case/matter that the item of content is associated with. The nature of the inferences that may be drawn from the contextual information may not be known at the time that an action occurs in connection with an item of content having an analysis identifier 80. [136] It is preferable that each new event or action connected with an item of content having an analysis identifier 80 that is detected be recorded in the Analysis Identifier Data Store. The event or action should be associated (linked) with the analysis identifier 80. [137] The analysis identifier 80 is preferably expressed as any sequence of characters, symbols or numbers. It is preferable for an identification string 82 to be contained in the analysis identifier 80. In its simplest form, the analysis identifier 80 may simply be an incrementing number, and the identification string 82 is that incrementing number. In a more preferable form, the analysis identifier 80 includes other derivable elements in addition to the identification string 82. 18 [138] In its simplest form, the identification string 82 may be in the form of a sequence of numerical characters. According to one embodiment, a simple schema (syntax) for an analysis identifier may take the form of an incrementally generated number (defined as regex statement "(Al [\s])+([0-9]+)($ I\s)", which will shall call 'sample analysis identifier a' herein). The numerical sequence is an identification string 82. The sequence of numerical characters may identify a category or grouping, such as a customer number, a matter number or an organisational group. More preferably, the sequence of numerical characters may be unique to this item of content 68. By adding this additional layer of abstraction, greater flexibility is achieved when undertaking subsequent analysis of the data ultimately collected, when compared to using a non-unique number such as customer number or a matter number. [139] In Fig. 6a-6b, examples are provided, whereby the first column is the text of the content identifier, and the second column is a summary of whether a basic determination of whether an analysis identifier is present. Let 34276 be a matter identifier known to knowledge worker's organization. [140] In an embodiment adopting sample analysis identifier a, four examples are provided in Fig. 6a. It will be appreciated that the match of the identification string of "66" in "Re: Driving via route 66" is likely in error. [141] In a more preferable arrangement, an analysis identifier may take the form of a sequence defined by "(Al [\[\s])+([0-9]+)($ I [\s\]])", which is called sample analysis identifier B herein. A series of examples are defined in Fig. 6b using sample analysis identifier / . By including a requirement that the analysis identifier 80 be enclosed by square brackets, the accuracy of the analysis identifier determiner 46 is improved, because the additional requirement of the numerical value being enclosed in square brackets will reduce the number of potential matches for any given sequence. [142] Functions to further improve the accuracy of the analysis identifier 80 are described further below. [143] The analysis identifier determiner 46 preferably determines whether the text of the content identifier 66 contains an analysis identifier 80. An analysis identifier builder 48 is provided. In the event that it is determined that an analysis identifier 80 is absent from the text of the content identifier 66, the analysis identifier builder 48 is used to build an analysis identifier 80 for the item of content 68. [144] Fig. 7a / 6c illustrates the elements of an analysis identifier 80 according to one embodiment. A sample analysis identifier is shown 84 (called sample analysis identifier 5 herein). The exploded box outlines in the first column the elements that this analysis identifier 80 is comprised of, and in the second column the sample values from sample analysis identifier 5 84. Potential matches for sample analysis identifier 5 84 can be initially identified by applying the regex statement "\[\d+-GE-2-[0-9A-F]+\]". [145] The identification string 82 is obtained from an analysis identifier repository manager 28 according to one embodiment. If an analysis identifier repository manager 28 is not accessible, a probabilistically 19 unique identifier is used as the identification string 82. The probabilistically unique identifier may be, for example, the concatenation of the strings <username>@<millis since epoch>_<machine name>. All of 3 variables can be calculated with reference to an operating system and hardware clock. An example identification string 82 is hence thomas@1368976830_Zeus. [146] The identification string 82 in another embodiment is a category identifier in the form of a case or matter number, ascertained from contextual information connected with the item of content 68. For example, by using the location that the item of content 68 is stored in the digital content repository 10. By further example, at a law firm where all digital content associated with a matter is stored in a single folder labelled with the matter identification number, the analysis identifier builder may use the file path of the item of content 68 as the identification string 82. [147] By accessing an analysis identifier repository manager 28, the system 8 may retain a unique or near-unique set of analysis identifiers associated with any given item of content 68. [148] The analysis identifier repository manager 28 stores data in an analysis identifier data store 92, in the form of an SQL database. The mechanics and operation of a SQL databases are well known, and basic usage thereof understood by a skilled reader. [149] Whenever an analysis identifier is generated and/or applied to a content identifier, that analysis identifier should preferably be associated with particulars of the item of content in the analysis identifier data store 92. By way of a series of examples, when an item of content (content piece A) is received by the mail server, the email message id is associated with the given analysis identifier (identifier A). [150] This normalization aspect of associated a single content identifier across multiple digital content repositories can be used to improve the accuracy of the user activity data processing module. For example, when an email is received, there may be little known about the email, except for the header and text of the message itself. If that same email message is saved into a legal case, there is now additional information known, that the email concerned that legal case. [151] An entity-relationship diagram of the analysis identifier data store 92 is illustrated in Fig. 8. The analysis identifier repository manager maintains an analysis identifier data store comprising a list of issued identification strings, and associated with each said issued identification string a list of one or more content repository identifier records, each content repository identifier record containing a repository identifier and a content identifier. [152] An entity identifier 86 is provided in sample analysis identifier 5 84 to reduce the likelihood of cross-domain conflicts. For example, if two organizations, OrgA and OrgB, were each running an identical system for collecting definitive data, there may be conflicts in the analysis identifier 80. That is, the analysis identifier 80 generated by the analysis identifier builder 48 at OrgA may be interpreted by as a valid analysis 20 identifier 80 belonging to OrgB by the analysis identifier determiner 46 running at OrgB. By providing an entity identifier 86, such as "GE" for the "General Electric" organization, there is a reduced likelihood of cross-domain conflicts. The issue of cross-domain conflicts is also mitigated by use of a checksum portion 90. [153] A version identifier 88 is provided in sample analysis identifier 5 84 to enable the analysis identifier determiner 46 to apply differing sets of rules to determine whether a given string is an analysis identifier 80, and maintain backwards compatibility in the event that incompatible rules are introduced (for example, if the checksum portion calculation is updated or changed). [154] A checksum portion 90 is provided in sample analysis identifier 5 84, providing three benefits: first, increasing the accuracy of the analysis identifier determiner 46 by reducing the likelihood of a typographic error being introduced; second, to prevent the analysis identifier 80 being tampered with; and third, reducing the likelihood of cross-domain conflicts (as described above). [155] A checksum portion calculator 50 is provided to generate the checksum portion 90 based on the identification string 82, the entity identifier 86 and the version identifier 88. The algorithmic operations of the checksum calculator 50 are illustrated in Fig. 7b, according to one embodiment. [156] A concatenated raw string is created, comprised of a security string in the form of a salt string, the identification string 82, the entity identifier 86 and the version identifier 88. The salt string is used to include additional private data to obfuscate how the checksum portion 90 is calculated. To facilitate further explanation, a salt string of "sTsT4mRIdlKvI" is selected according to one embodiment. The important characteristic of the salt string is that it is not publicly known. [157] For the sample values of sample analysis identifier 5 84, the concatenated raw string is created at 1078 is "sTsT4mRIdlKvI-162537GE:2". Next, the concatenated raw string is converted to upper case at 1079. In some applications and/or operating systems, the case information of filenames or other types of content identifiers may be discarded. [158] Next, calculate a hash string by applying the md5 cryptographic hash function using the concatenated raw string as its input at 1080, and represent the output data in base 16 (hex). Using a POSIX OS with openssl, this command performs this function: echo "sTsT4mRIdlKvI-162537-GE-2" I openssl dgst md5 output = 9da45a34baff2O588f7da85alfll5lf6. As hash functions typically consider case when creating a hash of a string, the step at 1079 ensures that case does not effect the hash function result. [159] The 'base 16' output format is selected because the characters are common amongst most fonts and the format is case-insensitive. Accordingly, adoption of a case insensitive encoding schema is preferable, as case information may be discarded from text of the content identifier. In another embodiment, the 'base 32' encoding standard may be utilized. 21 [160] The calculated hash string is truncated to include the first 6 characters only at 1082. The truncated hash string is the checksum portion 90 (9da45a using the sample values). It will be appreciated that any suitable cryptographic hash functions may be adopted, and any functions may have a smaller checksum size (e.g. CRC-16), in which case the truncation of the string may be a redundant step. [161] The checksum portion 90 is appended to the text of the analysis identifier 80. Ergo, the resulting analysis identifier 80 is "162537-GE-2-9da45a". [162] Processes that share similarities with the operations of the analysis identifier builder are performed by the analysis identifier determiner 48. The analysis identifier determiner 48 determines whether text of the content identifier contains an analysis identifier 80. Preferably, the analysis identifier 80 should contain elements that allow its authenticity to be validated. [163] Referring to Fig. 10, the analysis identifier determiner 48 preferably loops through each regex match found in the text of the content identifier. If there are no matches remaining at 1086, then the text of the content identifier does not contain an analysis identifier at 1088. There may be more than one regex match to check. For each regex match in text of content identifier, the matched string is extracted using the regex statement at 1090. The matched string is broken down into one or more elements, the elements of an analysis identifier 80 having been described above. [164] The checksum portion 90 is calculated by the checksum calculator 50, as previously described, with reference to the identification string portion, the salt string and the entity identifier portion. The version identifier may also be included. Once the checksum portion is calculated, it is compared with the matched checksum portion of the matched text at 1092. If the hexadecimal values of the two checksum portions are not equal, then it is not a analysis identifier 80, and the foreach loop is again evaluated at 1086. [165] If the hexadecimal values of the two checksum portions are equal, the validity of the identification string is determined at 1096. To determine the validity of an identification string, the analysis identifier determiner may check that the identification string is within a valid range of numbers allocated to the given version identifier. [166] As another determination of validity, analysis identifier determiner may check if a record of the identification string exists in the analysis identifier data store, by making a request to the analysis identifier repository manager. [167] A yet further check may be to ensure that the file content type contained in the analysis identifier data store for the given identification string matches the file type of the item of content 66 under consideration. 22 [168] If the identification string of the matched string is invalid for any reason at 1098, control returns to the foreach loop, which is again evaluated at 1086. Otherwise, if the identification string is valid at 1098, the text of the content identifier contains an analysis identifier at 1100. [169] As the analysis identifier may be subject to new versions or changes over time, there may be more than one schema to search for. In this case, the operations outlined in Fig. 10 should be repeated for each version, schema or variance. [170] In an alternative embodiment, the creation date stamp of an item of content could be adopted as the identification string for that item of content. As the creation date stamp is, in most cases, remains unmodified as an item of content is moved within a digital content repository, or copied to another digital content repository, it serves to provide a quasi-unique identification element for that item of content. This alternative method may be useful when an identification string is unable to be received from a central source, or where the usual source of the identification string is unavailable. [171] Additionally, upon the text of the content identifier having been determined to contain an analysis identifier 80, content detection module sends the identifying particulars of the item of content and the identification string to the analysis identifier repository manager. By this mechanism, the same identification string can be used to locate the same item of content 66 residing on multiple digital content repositories 10. [172] A content identifier alteration module 52 is provided, preferably to alter the text of the content identifier to include the analysis identifier 80 built by the analysis identifier builder. For example, the filename is altered from "Staff Annoucement.docx" to "[162537-GE-2-9da45a] Staff Annoucement.docx". In the case of an email the subject line an example amendment of the text of the content identifier may be from "Re: Tuesday meeting" to "Re: [162538-GE-2-68aaf8] Tuesday meeting". The content identifier alteration module 52 may insert or amend/replace a content identifier to include an analysis identifier 80. [173] In addition to altering the string, the change of the text of the content identifier should be saved back to the relevant digital content repository. In the case of an email server accessible via IMAP, this is achieved by updating the contents of the email message data file, and identified to the IMAP server using the messageld of the email message. [174] The analysis identifier is inserted at the start of the text of the content identifier for a filename, or near the start in the case of a subject line of an email message, to minimize the risk of truncation of the analysis identifier, if the active window title is truncated by an active application, or the operating system. [175] An analysis identifier removal module 62 may preferably be utilised, to remove analysis identifiers from content data of one or more content identifiers of an outbound item of content, prior to the outbound item of content being transmitted to another entity via a public network. For example, an event sink on the 23 Microsoft Outlook server may remove any valid content identifiers identified in the email message prior to the message being delivered to external recipients. [176] The steps taken by the content identifier alteration module 52 when handling a digital content repository 10 that provides for a given item of content in the form of a document to be held in a locked or unlocked write state, and does not allow for files to be renamed when applications have access to a file (such as samba protocol), is illustrated in Fig. 11. A locked state may also be described as "checked-out" or "in use", depending on the terminology of the digital content repository 10. [177] To aid the reader's understanding, preferable precursor steps having occurred previously are outlined; namely, that new content has been detected by content detection module (at 1112), that a content identifier for the item of content having been selected by content identifier selection module (at 1114) and analysis identifier builder having built analysis identifier (at 1116). [178] An initial sleep at 1102, is desirable, to ensure there is time for the newly created file to be locked again, should the save and lock steps need to be performed sequentially by the process that created the item of content 66. [179] Next, determine whether the document is a locked state at 1104. If it is locked, sleep for predetermined polling period of 4 seconds at 1106. If it is not locked, alter the text of the content identifier by inserting the analysis identifier at 1008. The analysis identifier store is updated with the particulars of the item of content 66 at 1110. [180] The system includes a user activity data processing module 54 adapted to receive a plurality of user activity data records, each user activity data record associated with a user identifier, and each data row containing the user identifier's active window title, and a time indicator. [181] There are various approaches to passively collecting the information about the active window title on a given worker's computer. In the present embodiment, the Mac OS X operating system is used as the basis of describing this operation. A personal skilled in the art would understand that the API calls under the Win32 environment may differ in name, but provide similar functionality. [182] There are two significant aspects to the creation of data rows that contain the user identifier (in the form of a username or email address), the active window title, and the time period that the window title was active. Firstly, that the window title, preferably also including the application name is tracked; and secondly, that the active window is only tracked when it can be ascertained that the worker is actively engaged with the computer. If after a predetermined or dynamic time period, e.g. 60 seconds, there has been no inputs from the user such as moving the mouse or pressing a key, then the system should allocate all time following and until further input is detected as 'offline'. Preferably, a user dialog is presented to the 24 user when user input is again detected, giving them the option to enter a comment about the time spent away from the computer. [183] In the present embodiment, a background process that links with the AppleScript API is adopted to implement this aspect of the invention. [184] At anytime, the process name and window title of the current active application can be captured. In AppleScript on Mac OS X, this can be achieved by calling 'tell process frontAppName' and 'tell (1st window whose value of attribute "AXMain" is true)'. Using one approach, a background process can continually poll the AppleScript periodically to gather historic records of active application switching. This method is illustrated in Fig 16b. [185] The user identifier in the form of a username may be determined via AppleScript by calling 'set user-name to (short user name of (system info))'. Alternatively, if the username needs to mirror another system, i.e. a central directory of usernames maintained by the organization, or takes the form of an email address, it may be stored in a settings file accessible by the process that is collecting the information about the window titles and creating the data rows. [186] In an alternative configuration, a plurality of data rows may be received from an external data storage and application provider, such as that provided by the RescueTime.com application and its associated API. For organizations where the processes and methods are confidential, or where confidential information may be gleaned from analysis of the activities of its knowledge workers, or an organization with a large number of users, the use of an external data storage and application provider may be highly undesirable. [187] In another preferred configuration, the system registers to receive user action events, and obtains the application name and window title at the time that each user event occurs. [188] Each record has an associated time period, starting from the timestamp when the current active window was first detected until a new active window title bar description is detected. [189] For example, 'tell (1st window whose value of attribute "AXMain" is true)' may return the following sequence at the following times.... - 20130427-17:15:38.277: windowName=MS Word, windowTitle=P186TG-2-Pleadings.docx - 20130427-17:16:20.061: windowName=MS Outlook, windowTitle=Re: P186TG2 Review of draft pleadings - 20130427-17:16:22.783: windowName=Google Chrome, windowTitle=https://www.lexisnexis/?search=pleadings-precedent - 20130427-17:16:23.463: windowName=Google Chrome, windowTitle=Pleadings precedents.pdf 25 - 20130427-17:16:54.634: windowName=Firefox, windowTitle=Case Docket WebApp - P642TH-6 - 20130427-17:16:55.307: windowName=MS Outlook, windowTitle=P642T6 Re: Follow-up re meeting - 20130427-17:16:56.035: windowName=Google Chrome, windowTitle=TTPleadingv9 [P1682G-3] - 20130427-17:17:01.221: windowName=Firefox, windowTitle=Case Docket WebApp: P642TH-6 - 20130427-17:17:05.285: windowName=MS Outlook, windowTitle=Re: Rotor assembly [P186TG-4] - 20130427-17:17:17.577: windowName=Google Chrome, windowTitle=Rotar assembly [P186TG 4].CAD [190] This approach takes advantage of a common windowing strategy of operating systems and additionally in some applications that support an internal tabbed interface such as web browsers, where each window has a unique and transient 'Z order' to determine which window should be displayed to the user. At anytime, there can only be one window or tab that is 'active', that is, that has the front-most Z order. [191] Fig. 4 details an algorithm implemented by the active window monitor module 52 (may also be referred to as window activity monitor module or WAMM). [192] When the process is first launched, the process registers to receive all direct user interaction events at 1210, including mouse events (movements or clicks) and keystroke events (keydown, touchscreen drag event, mouse click, keyup and so on). Other direct user interaction events may include touch-screen related events, or specialist assistive technologies for sensory impaired users. [193] A convention of current operating systems is that whenever a document is opened, its window becomes the 'active' window. That is, no other windows overlay it. When a direct user interaction event is received, the next step of the active window monitor module 50 at 1212, is to obtain the application name and window title (method previously described). [194] Upon the operating system event bus sending a mouse event or keystroke event to the active window monitor module 50 at 1040, most particulars of the user interaction event is preferably discarded (for privacy or efficiency reasons). For example, in the case of a user interaction event in the form of the user having pressed a key on a keyboard, the key that was pressed should preferably be discarded, for both privacy and security reasons. [195] If a timestamp of when the event occurred is included in the user interaction event generated by the computing device 34, it should preferably be retained. Alternatively, if a timestamp is not included in the event, the timestamp should be calculated by obtaining the system's hardware clock. A user activity data record (that is, an log of the event) is created and ultimately saved to the user activity data store. The user 26 activity data store is preferably located on a remote machine. Alternatively, the user activity data store may be stored on the computing device 34 that the active window monitor module 50 is monitoring. [196] At the time the user interaction event is received, the active application name, active window title should be obtained, to ascertain which window of the computing device 34 the user is interacting with. There may be many user interaction events received each second when the user is using their computing device 34, and hence it is preferable to amalgamate any identical and sequential user activity data records together, to form a start and end timestamp. [197] A user activity data record preferably includes a start timestamp value and a user identifier. It may also include an end timestamp value, an active application name, and an active window title. A set of sample user activity data records 104 is illustrated in Fig. 18. [198] It is preferable for a subset of user activity data records to include user location information, such as geographical co-ordinates and/or a textual description of the location that the geographical co-ordinates references. [199] A preferred mechanism for gathering the user activity data records having user location information is outlined herein, with reference to Fig. 14 and Fig. 15. A user location activity data record augmenter 108 is provided. The user location activity data record augmenter preferably runs periodically triggered by a timer at 1462, for example once every 60 seconds. Alternatively, it may be event driven in some cases (the Apple iOS computing device for example, provides a Significant-Change Location Service which is event driven). For privacy reasons, it is preferable that the user location activity data record augmenter 108 is only run during the hours specified by the user, for example, during standard office hours only. [200] The first step is to determine the amount of time that has elapsed since the user of the computing device caused a user input event to be triggered. That is, how long has it been since the user interacted with the computing device? This is determined at 1464 by retrieving the latest user event from either a local temporary storage such as memory or disk of the computing device, or with reference to the user activity data store 58. [201] If the user is determined at 1466 to have been active within a predetermined period (e.g. in the last 5 minutes), then the process ends at 1468. If the user is determined to be idle, then user location activity data record augmenter 108 then looks up a stored list of user location devices that are associated with the user at 1470. For example, this could be the user's RFID access badge, which is primarily used to gain entrance into the workplace building; another example is a mobile telephone having a GPS receiver unit. The GPS receiver as a user locator device has the advantage of a simple implementation thereof, but provides more potential for a user's privacy to be invaded. Accordingly, the preferred approach of the present embodiment is to use an RFID tag, which is only locatable at the office location. The RFID tag approach 27 illustrated in Fig. 15 also has the benefit of achieving a level of indoor accuracy that is not possible with current GPS technologies. For some industries, the GPS receiver may be the preferred approach, for example, with a roaming team of sales members, or users that work from home or as individuals. [202] For each user location device associated with the user, the device is probed for its location at 1472. If the location is obtained at 1474, the location information, time indicator and user identifier are saved to the user activity data store at 1476. The system loops user location activity data record augmenter 108 until all devices have been probed for their location at 1478. [203] An example of a method to probe an RFID tag for its location is illustrated in Fig. 15. Readers 1, 2 and 3 (R1, R2, R3) are fixed position RFID readers. Each reader is requested whether the given user's RFID access tag is within its readable range. In the example illustration, there are three user's tags, represented as Ta, Tb and Tc. The readers and tags are overlaid over an office floor plan, having three rooms (Room A, B and C). Each reader includes the signal strength of the read of the given user's RFID tag. This is shown with reference to tag Ta, as Dia, D3a and D2a. With these three signal strengths, the approximate location of the user location device can be obtained via trilateration. There is a wealth of information regarding the process of rfid positioning systems, as it is used in presently in industrial automation, to track the movement of parts within a factory and so forth. [204] A timeline presentation module is provided. An example interface produced by a timeline presentation module is illustrated Fig. 12a. [205] Fig. 13 outlines a set of preferred set of steps to enable the presentation module to have the data available to present a timeline of the user's activity for a given day. The system may also preferably be used to answer specific analytics questions, such as for example, "how much time was spent by users this month reading emails from our human resources department?". "What are the distinguishing characteristics of the way that outstanding worker X performs his daily duties?". [206] A preferable process of creating a timeline of pre-categorized user activity for a range of time (time period A) is initiated at 1402, as performed by the user activity data processing module 54. The user activity data processing module 54 loads user activity data records from the user activity data store 58 within the date/time range of time period A at 1404. [207] For the set of user activity data records obtained at 1404, for each of these user activity data records, a process is defined starting from 1406 for each record. [208] If a 'WindowTitle' value is present, it is inspected using the analysis identifier determiner 46 at 1408, and if no analysis identifier 80 is present, no further action is required for this record at 1410. [209] If the WindowTitle value contains an analysis identifier 80 at 1408, the relevant category(s) or suggested category(s) for the user activity data record are calculated at 1412. The process for calculating a 28 category in the form of a matter identifier or general category will now be described with reference to the needs of a professional services or legal firm. However, it will be appreciated that the process can also be applied to address other categorization requirements, and this is but one example. [210] The system may preferably choose to delineate between two types of categorization according to the surety of the match, such as a "confident" and "suggested" categorization surety, as one example. [211] The categorization of three example user activity data records will be described, and in this example, let there be 5000+ 'categories' in the form of specific matter identifiers whereby time is billed to a client, and three broader categories where no client is billed. [212] The reader is referred to the user activity data record (a-c) examples provided in Fig. 18. For each of these items, the analysis identifiers are identified (8-b729w, 7-cc54w and 9-h729w). In this example of a simple analysis identifier, an identification string and checksum portion are provided. The checksum portion is validated, and the identification string of 8, 7 and 9 are isolated. [213] Fig. 19a contains a table, representing one identifier record, as described in the ER diagram appearing in Fig. 8, whereby the table represents records of type ContentRepositoryRecord, stored in the analysis identifier data store. Fig. 19b and Fig. 19c contain information about identification strings 7 and 9 respectively. With reference to Fig. 19a, there is a single record appearing in the table. It describes that within repositoryld labelled 1, which is a file server, the item of content is represented by identification string 8 in that digital content repository is locatable by the string '/Matters/L1240/Pleadings [8-b729w].docx'. In this example, this is the file path where the document is stored on a file server. There are similar examples of file storage in Fig. 19b and Fig. 19c. [214] In Fig. 19c, there is an additional digital content repository where the item of content has been stored, in the form of an email server. In this case, the contentLocator '20130516221008@adventadvisors.com' is the email message id, that can be used to locate the email message from the email server. In this example of Fig. 19c, the email was also copied to a file server, to a folder that stores documents associated with matter identifier L233. There may be logic for the given cases/matters that allows the system to derive the matter identifier from the file path where the document is stored. In this example, an identification string of 8 represents case L1240, and identification string 9 represents case L233. [215] By way of a further example, if the categorisation required were to categorise the recipient for time spent on email messages, and referencing Fig. 19c, it may be preferable to retrieve, from the file server or email server, the item of content, using the contentLocator value to locate the item of content on the given digital content repository represented by the repositoryld, and then examine the From/To headers of the email to categorise the time in accordance with this manner of categorisation requested. 29 [216] There may be a rule in place that all financial documents relating to an example firm are stored in the Financial/ directory of a file server. In this case, with reference to Fig. 19b, user activity data records falling within the given time period having an identification string of 7 could be categorised as the example non-billable category of "Managing Firm Financial Affairs". Proceeding with the example, user activity data records having an identification string of 8 would be categorised as matter L1240, and user activity data records having an identification string of 9 would be categorised as matter L233. [217] In one embodiment, the matter identifier may be derived from the analysis identifier itself, in which case, no reference would be required to the analysis identifier store. For example, the analysis identifier builder may use the matter identifier as the identification string, e.g. Pleadings [L1240 b729w].docx. This approach reduces complexity on the one hand, and reduces the breadth of categorisation strategies can be performed on the other. [218] For each user data activity record appearing in Fig. 18, a category value has now been determined. Returning to Fig. 13, the relevant determined category is attached to the user data activity record at 1414. The form of attachment may be ephemeral being relevant only to the current categorization task being performed, or persistent in some cases of a routine or repetitive categorization task. The foreach loop for the given user data activity record ends at 1416. [219] Once the foreach loop that attempts to categorise each user data activity record in accordance with the categorisation strategy of interest has been executed, a timeline presentation module 56 presents the categorised user data activity records to user at 1418. The timeline presentation module 56 may preferably additionally display and user data activity records having user location information next to the relevant time periods, to provide additional contextual information to assist with the categorisation process. [220] Example output from a timeline presentation module 56 is provided in Figures 12a-12c. A categorisation column appears at 106. Upon the user having finalised the visual categorisation process that the timeline presentation module 56 provides, the user activity records may be submitted to an system such as a billing or timesheet recordal system, including the relevant categorization information, the application and window title if such is available, the user's location if such is available, and any additional comments or meta data that may be available and relevant to that external system. [221] For illustrative purposes, in the example contained in Fig. 12a-c, a basic form of an analysis identifier is shown, whereby the categorisation identifier can be directly derived from the analysis identifier itself. [222] If there is location information that matches the user's location during an offline/idle period from the same organisation, it may be preferable to include this also. For example, at meeting room A with Jeff and Todd. This is made possible by searching for any other matching location data from users during the same time period or point in time. 30 [223] A preferable aspect of presenting a timeline in sequential order in the case that the user is to review the entries is that this format may provide additional context to assist the user's recollection and/or understanding of what a time entry concerns with reference to the preceding and following entries. [224] The invention as described performs a series of operations in concert, to create a tangible data set that provides a degree of definiteness hitherto unachieved that is capable of operation with respect to most if not all applications used by the computing device of the knowledge worker, without each and every application on that computing device having to be altered to conform to additional requirements. That is, to passively gather definitive data from the activity of a knowledge worker interacting with a computing device, without resort to invading or analysing the content being viewed or created, via machine learning or analysis techniques, which are often unreliable. [225] The system may facilitate, for example, reliably categorisation of time expenditure of knowledge worker activity based on post-facto determined analytical variables. That is to say, the analytical variables applied to a given knowledge workers activity can be applied retrospectively, and according to when the knowledge worker was infact actively performing some action and during what time period that act was performed, rather than with reference to an inference that some intangible metric such as the log file of a computer device confirming that an item of content was fetched by a given user. [226] It may be desirable to remove the analysis identifier, so that it is not visible to an external recipient. In this case, we examine the TO, CC and BCC headers, and if any domains do not match the FROM: domain, then it is considered to be travelling externally. This may be preferable if an organisation wanted to conceal the analysis identifier from entities that are external to the organization. [227] The systems, servers and methods as described in the preferred embodiment can continue to function in the case that items of content is stored in an encrypted format, decipherable only by certain knowledge workers. [228] It will be appreciated that embodiments of the present invention provide a method, apparatus, server, system, and computer-readable medium for collecting definitive data relating to time expenditure of a knowledge worker. Although the invention has been described in language specific to computer structural features, methodological acts, and computer readable media, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific structures, acts or media described. Therefore, the specific structural features, acts and mediums are disclosed as exemplary embodiments implementing the claimed invention. Moreover, it should be appreciated that, according to the embodiments of the invention, the software described herein has been implemented as a software program executing on a computer system. Alternatively, however, the software operations described herein may be performed by a dedicated hardware circuit, by program code executing on a general-purpose or specific-purpose microprocessor, or through some other combination of hardware and software. Alternatively, it may be via a 31 non-transitory computer-readable medium comprising computer-readable instructions for execution by a server or computer. [229] Any reference to 'definitive data' should be understood to denote an improvement in the definitiveness and/or the accuracy of the data that the systems, servers and methods may collect, and should not be understood in the sense of data being perfect. [230] Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in this embodiment without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents. [231] As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, apparatus, method or computer program product, Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc,) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system," Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon. [232] The described aspects of the invention may also be implemented as a computer-controlled apparatus, a computer process, a computing system, an apparatus, or as an article of manufacture such as a computer program product or computer-readable medium. The computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and/or encoding a computer program of instructions for executing a computer process. [233] In the present embodiment, there are detailed descriptions of implementation details with reference to specific operating systems. The skilled reader will appreciate that any methods or functions defined with reference to a specific operating system may have equivalent or analogous operations available in other operating systems. While the equivalent or analogous operations may differ in name and in the underlying implementation, alternatives may be used to achieve similar effect to that described. [234] The present embodiment describes a sequence of modules, calculators, managers and data stores. These elements can be run across a plurality of computing devices 14 that are connected via one or more computer networks. They may be implemented using preconfigured digital circuits or circuitry, or using precompiled or pre-sequenced software that is executed on digital circuitry. The elements could be grouped into a single computing device 14, or a single application running on the computing device 14. The description of the system and method with reference to logical entities is to aide the reader's understanding of the system and method, and the description and figures is to illustrate and inform, and the segmentation 32 of responsibility of the elements, or general arrangement of responsibilities of the elements, may be varied according to the specific technical considerations of each implementation. [235] Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium, A computer readable storage medium may be but are not limited to, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing, More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read- only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. [236] A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device, [237] Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. [238] Computer program code for carrying out operations for aspects of the present invention 30 m a y be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server, In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). [239] Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be 33 implemented by computer program instructions, These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. [240] These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a paiticular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the 20 function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or the programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. [241] Referring again to Figs. 1-8, the diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or a block diagram may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures, For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the block may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. [242] Accordingly, techniques of the invention, for example, as depicted in Figs. 1-8, can also include, as described herein, providing a system, wherein the system includes distinct modules (e,g,, modules comprising software, hardware or software and hardware). One or more embodiments can make use of software running on a general purpose 10 computer or workstation, With reference to Fig. 2, such an implementation may employ, for example, a processor device 60, a memory 68, and an input/output interface formed, for example, by a display and a keyboard 70. The term "processor device" as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry, Further, the term "processor device" may refer to more than one individual processor, The term "memory" is intended to include memory associated with a processor or CPU, such as, for example, RAM (random access memory), ROM (read only memory), a fixed memory device 34 (for example, hard drive), a removable memory device (for example, diskette), a flash memory and the like. In addition, the phrase "input/output interface" as used herein, is intended to optionally include, for example, one or more mechanisms for inputting data to the processing unit (for example, keyboard or mouse), and one or more mechanisms for providing results associated with the processing unit (for example, display or printer) [243] A processor device, memory, and input/output interface such as a display and mouse may be interconnected, for example, via bus as part of data processing unit. Suitable interconnections, for example, via bus, can also be provided to a network interface, such as a network card, which can be provided to interface with a computer network, and to a media interface, such as a diskette or CD-ROM drive, which can be provided to interface with media. 35

Claims

1. A computer implemented method for categorizing time expenditure of a computing device user is provided, comprising: detecting an item of content has been added to one or more digital content repositories; selecting a content identifier from the item of content; determining whether text of the content identifier contains an analysis identifier, the analysis identifier having an identification string, and whereupon an analysis identifier is determined to be absent from the content data of the content identifier building an analysis identifier, and altering the content data of the content identifier to include the built analysis identifier; and receiving a plurality of user activity data records, each user activity data record associated with a user identifier, and each user activity data record containing a time indicator and the active window title of the computing device at or during that time indicator.

2. A computer implemented method according to claim 1, whereby selection of the content identifier is based at least in part on the file type of the item of content.

3. A computer implemented method according to claim 1 or 2, wherein the operation of detecting an item of content has been added to one or more digital content repositories further comprises detecting an item of content has been requested from the one or more digital content repositories.

4. A computer implemented method according to any one of the preceding claims, wherein the content data of the content identifier is altered by inserting the analysis identifier at the start, of near the start of, the content data of the content identifier.

5. A computer implemented method according to any one of the preceding claims, wherein the analysis identifier further includes an entity identifier and checksum portion.

6. A system for categorizing time expenditure of a computing device user is provided, comprising a content detection module, to detect an item of content has been added to one or more digital content repositories; a content identifier selection module, to select a content identifier from the item of content; an analysis identifier determiner, to determine whether content data of the content identifier contains an analysis identifier, the analysis identifier having an identification string; 36 an analysis identifier builder to build an analysis identifier for the item of content; a content identifier alteration module, to alter the content data of the content identifier to include the analysis identifier built by the analysis identifier builder; a user activity data processing module adapted to receive a plurality of user activity data records, each user activity data record associated with a user identifier, and each user activity data record containing a time indicator and the active window title of the computing device at or during that time indicator; and whereupon the analysis identifier determiner having determined that an analysis identifier is absent from the content data of the content identifier, the analysis identifier builder creating an analysis identifier, and the content identifier alteration module thereafter altering the content data of the content identifier to include that analysis identifier.

7. A system according to claim 6, whereby the content identifier selected being based at least in part on the file type of the item of content.

8. A system according to any one of the claim 6 to 8, wherein the analysis identifier includes a checksum portion; and the system further comprising a checksum calculator to generate checksum value based on the identification string, and wherein the checksum calculator is operatively coupled with the analysis identifier determination to assess the validity of a checksum value; and operatively coupled to the content alteration module for building the analysis identifier.

9. A system according to any one of the claim 6 to 9, further comprising an active window monitor module, to monitor the active window title of a computing device, by registering to receive user input events of the computing device using a windowing system, and obtaining the window title of the foremost window of the computing device when user input events are received; and whereby the active window monitor module determines whether the window title contains an analysis identifier, and whereupon an analysis identifier is absent, finding an item of content that correlates to the active window.

10. A data carrier storing computer program code adapted to perform the method steps of claim 1 when said program code is executed by a computer. 37