GB2373698A - Storage of a portion of a web-page containing a link - Google Patents

Storage of a portion of a web-page containing a link Download PDF

Info

Publication number
GB2373698A
GB2373698A GB0106920A GB0106920A GB2373698A GB 2373698 A GB2373698 A GB 2373698A GB 0106920 A GB0106920 A GB 0106920A GB 0106920 A GB0106920 A GB 0106920A GB 2373698 A GB2373698 A GB 2373698A
Authority
GB
United Kingdom
Prior art keywords
link
page
elements
user
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0106920A
Other versions
GB0106920D0 (en
Inventor
Geraint Edwards
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
COPYN Ltd
Original Assignee
COPYN Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by COPYN Ltd filed Critical COPYN Ltd
Priority to GB0106920A priority Critical patent/GB2373698A/en
Publication of GB0106920D0 publication Critical patent/GB0106920D0/en
Publication of GB2373698A publication Critical patent/GB2373698A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Abstract

Markup elements on a web page, which include a link to a further web page can be selected and stored in a temporary store. The link is then activated and the user taken to the further page. At the further page, the user can select elements of the pages to store for future retrieval. These elements include the link and any associated image and text. This allows the user subsequently to access the further page from an advertisement or the like even if the advertisement is no longer being carried by the page that originally carried the link.

Description

CAPTURE. OF MARKUP ELEMENTS
This invention relates to the capture of content from computer networks such as the Internet. It is particularly concerned with the storage and subsequent retrieval of that content.
Our earlier application GB 0021081, entitled Capture, Storage and Retrieval of Mark-Up Elements filed 25 August 2000 described techniques for extracting and storing mark-up elements from web pages. The invention disclosed is useful in that it allows users to store only the parts of web pages in which they are interested, which parts can be presented in a free form non-hierarchical manner which is far more user friendly than previous bookmarking/favourites options available in Internet browsers such as Microsoft Internet Explorer and Netscape Navigator. In the system described in the earlier application the user sees the actual elements stored rather than a text heading or the like.
Although the system described in the earlier application is most beneficial and a significant improvement over prior art systems, we have appreciated that it does not handle dynamically varying web content well.
In the system of the earlier application, elements are stored in a repository. for subsequent retrieval and viewing. However, it is sometimes difficult to save the actual content required. Consider as an example, an advertisement on a web page in which the user is interested. The advertisement most likely does not come from the provider of the page that is being viewed
but from a completely different source. Moreover, the advertisement that is displayed is likely to change every time the host web page is viewed. There may be, for example, a number of different advertisements which are displayed in turn when the host page is accessed.
The user will click on the advertisement to access the source page of the advertisement. If he wants to save a visual reminder of that page, that is the advertisement that took him there in the first page, he would like to be able to return to the original page and save the element on that page, which will be the link to the advertisers web site and any associated image. This gives him a reminder of why he was interested in the page in the first place. However, when he returns to the original web site he will find that a different advertisement is being displayed and so he can no longer save the element in which he was interested.
The present invention aims to overcome this problem.
Accordingly, there is provided a method of storing a portion of a first mark-up language page containing a link to a further mark-up language page, comprising the steps of: identifying, from a visual representation of the first page, a portion of the visual representation including a link to the further page; storing the portion including the link in a temporary store; initiating the link to access the further page; at the further page, identifying a list of candidate mark-up elements from a predefined set of elements for storage, the list including the visual representation including the stored portion from the first page ; selecting elements from the list ; and storing the selected elements.
The invention also provides apparatus for storing a portion of a first mark-up language page containing a link to a further mark-up language page, comprising: means for identifying, from a visual representation of the first page, a portion of the visual representation including a link to the further page; a temporary store for temporarily storing the portion including the link; a device for initiating the link to access the further page; means for identifying a list of candidate mark-up elements at the further page from a predefined set of elements for storage, the list including the visual representation including the stored portion from the first page; means for selecting elements from the list; and a store for storing the selected elements.
Embodiments of the invention have the advantage that the user can save the raw mark-up code relating to an advertisement or the like which contains a link to a further site, the advertiser's site, from that site by storing it temporarily before initiating the link to that site. At the new site, the stored code can be retrieved and the advertisement and link can be saved permanently.
Thus, the problem discussed above can be avoided.
A problem with existing systems is that there is no way of obtaining any information about a referring page other than its URL and title. Embodiments of the invention avoid this problem by temporarily storing relevant information before the referring web page is removed from the computer's memory.
Preferably, identifying a list of candidate mark-up elements comprises identifying the nodes of the Document Object Model (DOM) of the further page and the nodes of the DOM of the link which represent the identified
elements, and extracting the mark-up code for the identified nodes. This has the advantage that the user can recreate the link and any associated text and image at the further page and then store it. Thus the list of candidate mark-up elements that is considered includes elements that are determined by reference to the node in the Document Object Model representing the mark-up link used to navigate to the further page.
Preferably, the step of-temporarily storing the link comprises storing the mark-up code for the link in a cookie. Preferably, the cookie is associated with the domain of the script. Cookies are associated with specific web domains. Thus cookies from one domain cannot be read or changed by pages or scripts originating from a different domain. By storing the information in a cookie associated with the domain of the script rather than with that of the document being viewed the problem is avoided.
As a result, even if the new page or document is in a new domain, the script can still access the information held in the cookie and make it available.
Preferably, the storing of and initiation of the link are performed by selecting a command from a displayable menu, preferably an Internet browser context menu. This has the advantage that a script originating from a service provider has access to information originating from other domains. ordinarily, such access would be prohibited by Web browser security protocols. However, scripts activated by the context menu have access to the Document Object Model of the page on which the context menu was activated.
An embodiment of the invention will now be described, by way of example, and with reference to the accompanying drawings, in which: Figure 1 is a pictorial representation of the terminology used to describe the embodiment of the invention, for ease of understanding; Figure 2 is a portion of a sample web page having a context menu overlaid; Figure 3 is a view of a leaf having a number of cards; Figure 4 is a view of a sub-leaf; Figure 5 is a view of a sample web page; Figure 6 is a view of the Document Object Model (DOM) of the web page of Figure 5; Figure 7 is a flow diagram illustrating a process for identifying meaningful elements from the DOM; Figure 8 shows how the DOM tree of Figure 7 may be traversed when identifying meaningful elements; Figure 9 is a flow diagram illustrating a process for extracting HTML code for identified meaningful elements ; Figure 10 is a screen print showing how an element may be selected for saving; Figure 11 is a view of a repository/user interface; Figure 12 is a view of a typical web page. showing an advertisement; Figure 13 is a view of a context menu embodying the invention overlying the advertisement of figure 12; Figure 14 shows the webpage of the advertisement of figures 12 and 13 with a window enabling and element to be saved in accordance with an embodiment of the invention; and Figure 15 shows the original advertisement. that was selected displayed in the window.
In order to understand the invention it is useful first to review the technical framework underpinning it.
When a user of the Internet browses a web page using one of the available'web browsers'such as Netscape Communicator (NN) or MS Internet Explorer (IE), the page they see on their screen is actually a rendition of a stream of data presented to the browser in HTML format.
HTML (Hyper Text Markup Language), the language of the world wide web, consists of combinations of tags, attributes, such as size, and data/text, which are interpreted by the browser to create a potentially interactive display of information, that appears fairly similar across all operating systems (such as MS Windows, MacOS or Unix) and different browsers. The whole of a web page need not come from the same server. HTML tags allow the publisher of a web page to merge elements from different sources. In one of its most complicated manifestations, a web portal (such as my. yahoo. com), may bring in elements from many third parties-news stories from one company, stock prices from another and weather forecasts from yet another. They may also be selling part of their page to an advertising server that constantly changes the banner advert the user sees. Often, all of this information is retrieved directly by the user's machine without passing through the publisher's server.
In other words, the web publisher can merely point the user to the locations of the various elements of the page and allow the user's machine to obtain the information directly.
The source of a page being viewed by the user is usually dynamic in its content-for example, the front page of a newspaper's web site will be constantly changing.
occasionally pages change so frequently that some items seen on a page (such as a banner advertisement). may never be seen again by the user if they do not respond to them before the page is refreshed or changed; and even a summary of news articles on a web portal will be changing such that an interesting news story may be difficult to retrieve if it is not read at once.
HTML 4. 01 is an SGML (Standard Generalised Mark Up Language) application conforming to International Standard ISO 8879-Standard Generalized Markup Language. The full specification is available from the World Wide Web Consortium (W3C) and the detailed HTML 4.01 Specification Recommendation at is to be found at http ://www. w3. orq/TR/html401.
Within this specification of HTML 4.01 is the Document Type Definition ("DTD") that defines the markup language within the SGML framework. This document will be used to determine some of the rules followed by the embodiments to be described.
ECMAScript (International Standard ISO/IEC 16262) is a standardised scripting language based in large part on Javascript (Netscape) and Jscript (Microsoft). A detailed description of the language is published by ECMA in the ECMS-262 Ed. 3 standard at http ://www. ecma. ch/ecmal/stand/ecma-262. htm.
CSS2 (or Cascading Style Sheets, level2) describes a style sheet language which allows authors and users to attach style' (fonts, spacing, placement, size etc. ) to structured documents, including HTML documents and XML (Extensible Mark Up Language) applications. The latest W3C (World Wide Web Consortium) recommendation for CSS2, may be found at http ://www. w3. orcf/TR/REC-CSS2.
The Document Object Model (DOM) Level 2 Specification defines a platform-and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The DOM Level 2 is made of a set of core interfaces to create and manipulate the structure and content of a document, and a set of optional modules containing specialised
interfaces dedicated to XML, HTML, traversing the document etc. The DOM Level 2 Specification is believed to be close to a recommendation stage and the latest version is published at http ://www. w3. orcf/TR/DOM-Level-2.
The relationship between the DOM and the underlying HTML will be described later in the document.
The Extensible Markup Language (XML) is a subset of SGML that is completely described in the W3C recommendation of February 1998. The recommendation can be found at http : //www. w3. orq/TR/1998/REC-xml-19980210. XML is supplemented by a raft of other specifications about how the markup language is interpreted visually and how it can be manipulated by scripting languages for example. Note that each XML document will be accompanied by a DTD (since HTML 4.01 is as a specific case of XML it has its own DTD as was mentioned earlier).
To implement embodiments of this invention familiarity is required also with SQL/relational databases, Web server, and CGI/Perl or another interactive web server scripting or programming interface.
The following description relates to an embodiment developed to run on Microsoft's Internet Explorer browser IE (version 5) and Netscape's browser NN (release 6). It uses the ability of browsers to be customised by an application developer. Implementation in other browsers (such as Opera) requires a different user interface but the core mechanics of the underlying invention is the same. Such browsers need to be compliant with the standards described earlier.
This description relates to the latest major released version of the Microsoft's Internet Explorer web browser
Version 5 (IES) and the preview release of Netscape Navigator 6 NN6. These browsers have many subtle differences in their implementation of the standards described often using slightly different names for variables or functions. The embodiments to be described can be implemented in either browser; minor differences in functionality exist that allow differing enhancements to be applied in each environment.
Microsoft's Internet Explorer browser (version 4 onwards) allows developers to add custom items to the context menu ; a pop-up menu that appears on the user's screen when he clicks the right mouse button. The context mouse button is accessed slightly differently in the MacOs System. A detailed explanation of. the customisation of the context menu is now available from the Microsoft Corporation at their web site http : //msdn. microsoft. com/workshop/browser/ext/tutorials/c ontext. asp Netscape Navigator 6 provides a lot more flexibility to the developer to customise the browser but the process is a little more involved. Almost any part of the NN6 interface can be customised by adding or modifying XUL (XML based user interface language) overlay file and providing or modifying an associated script to the applications"chrome". A chrome in mozilla, the open source browser development project of Netscape Corp, is a complete front end, including all aspects of graphics, layout and functionality. The concepts are explained at http ://mozilla. orcf/xpfe/xptoolkit/overlays. html and http ://mozilla. orq/xpfe/xptoolkit/poups. htrnl.
An embodiment of the invention will now be described. Referring now to Figure 1, some terminology will first be described.
An Element of a web page is defined as an HTML tag, or a meaningful collection of HTML tags, which can be saved.
An element is likely to include the URL of an item of interest to a user, rather than a copy of the item itself.
Examples of Elements include: A banner advert ; a link ; an image, with or without an associated link ; an MPEG video; an MP3 sound file; and a table of images, which is an example of a meaningful collection of elements being classed as an Element.
A Repository is defined as an online database in which bookmarked Elements are stored. Each user can have one or more repositories.
A Card in the repository is defined as the visual representation on screen of a bookmarked element. It is customisable, but typically it looks like the original element from the original web page, surrounded by a rectangular border.
A Leaf in the repository is defined as the visual representation on screen of a set of cards. It looks like a page from a scrapbook with an index tab attached.
A View is defined as one way of categorising a set of some, or all, of the bookmarked elements in the Copyn repository together with their attributes such as position on screen, size, background colour etc. and the attributes of the leaves on which they are displayed. For any given set of Elements, that is a Repository, there can be many
different Views. views are made up of a collection of Leaves.
In the following description and claims, no distinction is made between the visual representation on screen of cards and leaves and the underlying mark-up data or its DOM representation. This is because the visual representation is the direct result of a web browser, or other such computer program, interpreting the mark-up data representation, or its DOM equivalent, of the card or leaf and generating the resultant visual image and behaviour on screen. Hence when it is stated that a card is movable on screen, it means that the underlying mark-up language or DOM equivalent is modified such that the web-browser, or other program, displays the card in another position. In addition it means that a user interface is provided, via the browser or the like, such that the underlying mark-up, or DOM equivalent can be manipulated.
Thus, in Figure 1, a browser window is shown generally at 10. Within the browser window 10 is shown a leaf 12 which contains cards. One such card is shown at 14 although typically a leaf would contain several cards. The card contains an element 16 which comprises a meaningful HTML element as described above. The card also includes a space 18 for inclusion of a user defined comment and domain name and other text. The leaf is one of a number of leaves in the repository and each leaf can be accessed by clicking on a leaf index tab 20. In the example shown, there are three index tabs 20, labelled"Default","News Items"and "Hotels". The leaf shown is the"News Items"leaf and the "News Items"index tab 21 is shown highlighted. At the top
right of the screen is a wastebin icon 22 which allows the user to remove a leaf and sent it to the wastebin.
There now follows a description of the interface whereby the web user can save a part or the whole of a web page.
The client interface allows the web user to save an element of a web page, or a link to the whole web page, to the repository; to follow the element's link immediately; E-mail the element to someone else; and/or open the repository.
Different set-ups can be configured for different situations. The interface allows the following options for saving an element: The element may be stored in a specified part of the repository such as personal, private-shared, pooled or public; The element may be categorised in one or more customised classifications as opposed to the default classification ; and The element may be described using one or more different types of identification such as customised name, text of link, title of page, visual representation (including the image portion of the element). Thus, the client interface permits elements to be saved accordingly to a defined degree of access, according to a defined categorisation and according to a defined description.
Different types of client interface can be used for different situations and it is likely that more than one may be available to the user in a given situation. Some interfaces are only available to the user if the web publisher has enabled them on their site, while other interfaces are always available to the web user by virtue
of the fact that they are registered system users. The following description refers to the implementation of an interface which does not require the web-publisher to activate the service, that is easy to use, but is limited to the newest web-browsers. This interface uses extensions to the context menu of the user's browser, accessed in Microsoft Windows by clicking the right mouse button when the mouse is over the relevant element or page background. In the example to be described it is assumed that the user has previously downloaded and incorporated the extension into their browser.
Turning now to Figure 2, an example of the context menu is shown. The user has previously registered with the service and has incorporated the relevant proprietary extensions to her browser. Whenever she wants to save an element of a page (or indeed the frame or page itself), she simply opens up the context menu by using the right mouse button and then selects the appropriate service option.
In Figure 2, the user has opened the homepage 30 of their Internet Service Provider. The context menu 32 is shown overlying the homepage. The context menu includes two extensions, add to Copyn 34 which adds an element to the repository, and launch Copyn 36, which opens the user's repository. Other options may be added and customised to the user's requirements. In the example shown in Figure 2, the context menu has been opened with the mouse pointer overlying the link about Euro 2000 tickets. It is important to understand that if the user selects the add to Copyn 34 extension it will be this HTML element or collection of elements which will be stored in the repository and not the entire homepage of the homepage URL.
When the user chooses either to add the element 34 or launch the repository 36, the application checks for the
appropriate cookie that would provide the server with the username and password.-If the cookie does not exist, then the user is asked to log-in to the service, or to register as a new user. A cookie is then saved on the user's machine that will identify her the next time she accesses the service. In both cases the Element is saved in the appropriate location in the repository, assuming it has not already been saved, and, if the user had selected the 'Launch Copyn'option 36 her default repository is opened in a new browser window. Using a single user account with cookies means that it is very easy for the user to set up Copyn for multiple browsers and machines, Thereby enabling the sharing of the service between the office and home, etc.
The Repository Interface will now be described.
The user can choose between a number of different customisable web-based interfaces, via which the saved elements can be viewed and manipulated. The two preferred interfaces are: A free-form"scrapbook"-like representation shown in Figures 3 and 4, and a hierarchical tabular representation shown in Figure 11 and which will be referred to-later.
The user can toggle from one representation to another and the simple, hierarchical tabular representation of Figure 11 is always available, for spring-cleaning purposes, for a quick overview of the contents of their repository, or for any other reason.
Referring now to Figures 3 and 4, the repository interface provides the user with wide range of functionality, including categorisation on screen display, a variety of services and means for sharing and connecting with other users.
Figures 3 and 4 are screen shots of the repository interface as it is seen by a user. In this case the user is displaying the interface in the Microsoft Internet Explorer browser. The interface includes a default categorisation 40 and a series of custom categorisations 42 which are defined by the user. In this case the user has defined four categories entitled, News Items, Basingstoke, Jenny Photos and Humour. The default category may be viewed as an in-tray for new elements saved.
The user of the system may be provided with a number of default categories which can be changed, by renaming, deletion or addition of fresh categories. categories are hierarchical, that is, Cards can be placed in categories, sub-categories, sub-sub-categories, etc. a single Card can be placed in many different categories or sub-categories at the same time.
A given categorisation of a given set of stored elements together with their attributes, such as position, size etc. referred to as a"view"of those Cards. Each
category is represented by a'Leaf'.
For example, imagine a set of"bookmarks"aboutindividual restaurants, in which each bookmark has been categorised by the location, type of cuisine and price range of the associated restaurant. Then three views of the bookmarks can be set-up: a"location"view, a"type of cuisine"view and a"price range"view.
The On-screen display of the illustrative"scrapbook" interface represents any category (or sub-category) of elements on screen by the relevant set of cards displayed on the appropriate leaf. The lay-out of cards on a leaf is similar to the lay-out of items on a page in a scrapbook, and the cards may be moved around by the user within a leaf, like loose cuttings, using"drag-and-drop". The
cards'remember'their new positions. The user can move a card from one leaf to another (thus re-categorising it), or to a"rubbish-bin" (thus deleting it), using"drag-anddrop". The user can'resize'any card, with the card's contents being scaled or wrapped, accordingly, inside the card's border. Within the border of any card, the user can place their own comments, and/or other information which they select from a standard list of fields, such as date bookmarked, source page, etc. The user can toggle between different views of a given set of cards.
A number of services can also be provided. The user can upload and merge existing"bookmark/favorite"collections from their browser (s) into the repository at any time.
This is particularly useful when a user first registers for the service. The bookmarks stored in the repository can be clicked through just as they would be on the original referring page. One current exception is where clicking the link would execute a javascript program. The user is kept informed about bookmarked elements that have expired/gone stale, or whose content has changed.
Management information is available to the user, for example: listing those bookmarks which have not been clicked through for longer than a given length of time; or listing those bookmarks which are most often accessed.
The user can send any one or more of their bookmarked elements, either individually or as a collection, to anyone else who has Internet access. This can be by email or as a message within the system. The sender can then categorise those particular bookmarks as having been emailed to that particular recipient; and both sender and recipient have the option of whether the sent bookmarks are linked or copied.
Various sharing and collaboration facilities are available. A user can create a"public"repository which,
at the owner's option, any other registered user can read from or add to. This facility allows users to create different types of repository ranging from a"free-for alla bulletin board to a"read-only"information site such as restaurant guide with links to restaurant web sites together with the repository owner's comments.
A user can authorise other, for example specially invited users, to have full access and use of a"pooled" repository. This service is particularly useful to clubs, societies, and the like where members share a common interest.
A user such as a school, university or corporation, can create a"private-shared"repository, for example running on their own web/database server, which enables students and/or staff to use the functionality of the system to collaborate on web-based research activities. A variety of options are available giving different individual users different privileges such as read, write, modify, etc.
In the Figure 3 example, the leaf 40 is the default leaf which is shown highlighted. The leaf contains seven cards 44,46, 48, 50,52, 54 and 56 and the waste bin 46. The cards shown are selected to show examples of some of the different types of meaningful HTML elements which can be saved. Element 44 is an HTML DIV containing a link element, a DIV element divides a page into a number of logical sections. Here, an image has a brief description of the story and clicking on the image or the link will take the user to the linked web site as if they have clicked on the original web page.
Element 46 is a simple text link. Element 48 is a 2x2 table of advertisements. The bottom left and top right 58, 60 of which have links, identified by their bold borders.
Element 50 comprises text extracted from a linked news headline ; the user chose to keep the text but drop the link. Element 52 is a banner advertisement in which an image is embedded in a link element.
Element 54 combines an image map and an image. The full map functionality is retained, for example, if the user clicks on the"Lawn and Patio"tab 62 they will be taken to that section of the amazon. com web site. Element 56 is also a DIV element comprising a link and some text, but which has been resized; the content has automatically obtained scrollbars to allow all of the content to be seen.
The user can move these seven cards around the screen, and resize them. The cards remember their size and location, so that when the user next returns to the repository, the lay-out of the view is preserved from the previous visit.
Figure 4 shows a leaf from the News Item Category of Figure 3. It can be seen that the News Item Category comprises seven sub categories 64, identified as Asia, America, Africa, Europe, Sport, Angus Deayton and Local.
Here the Europe sub-category 66 has been selected to display a leaf containing five cards 68. A waste bin 40 is also displayed in the leaf.
The manner in which the embodiments described operated will now be described.
An understanding of the relationship between the HTML and its DOM representation within the browser, and hence its availability to the browser scripting language, is essential to comprehend the manner of operation and will be described with reference to a simple example.
There are many subtle, and some significant, differences in the way that IE and NN turn the raw HTML of a web page into objects which can be accessed and modified by scripts, the DOM. However, the embodiments discussed rely almost exclusively on functionality common to both browsers, only deviating from this when a particular aspect of one browser or another offers significant implementation efficiency.
Figure 5 shows a simple web page comprised of some images and text. It is similar to the Card 40 shown in Figure 3. The first line ('This is my Table :') appears in a slightly larger font and although not visible in the drawing, in red. Below this text is a 2x2 table. The first column comprises 2 cells showing images, the second column includes images and text. Further subtleties can be seen in that the first row entries are aligned at the top of the table cells and the bottom row entries are aligned along the bottom.
The raw HTML used by the browser to construct this page is as follows: < HTML > < HEAD > < TITLE > An HTML/DOM Illustration < /TITLE > < /HEAD > < BODY bgcolor="beige" > < FONT color="darkred"size="+2" > This is my table: < /FONT > < BR > < BR > < TABLE border="2"cellpadding="2"bordercolor="darkblue" > < TR valign="top" > < TD > < IMG SRC="/images/USWEST. gif" >
< /TD > < TD > < A href="/test. html" > < IMG SRC="/images/etfront40" > Apricots are tasty < /A > < /TD > < /TR > < TR valign="bottom" > < TD > < A href="/experiment. html" > < IMG SRC="/images/Strange" > < /A > < /TD > < TD > Bananas are better ! < IMG SRC="/images/USWEST. gif" > < /TD > < /TR > < /TABLE > < /BODY > < /HTML > Figure 6 is a summary of the DOM representation of the page. The picture only shows a small subset of the information available in the DOM about the content of the page. Specifically it only shows the"nodeType" (1=NODE-ELEMENT, 3=NODETEXT),"tagName", number of "childNodes", the non-default"attributes"of each node and the"nodeValue"of any text nodes.
It can be seen that the DOM representation mirrors the hierarchy of the raw HTML that was used to create the page. Each node has one parentNode and each element node can have zero, one or more childNodes.
The DOM representation of the page can be interrogated dynamically and, within constraints, can be modified without editing the underlying HTML. For example the position of elements on the screen can be changed by modifying some of their attributes, or the value of text strings changed. In the above example, if we changed the value of document. getElementsByTagName A") [0]. childNodes [l]. nodeValue to"Oranges are tasty"our web-page would be modified onscreen such that it no longer told us that"Apricots are tasty"but that"Oranges are tasty".
Pages can be created on the fly, by a script manipulating the DOM directly without the need for any raw HTML, other than the code of the script itself, being read by the browser.
There now follows a description of manner by which the user saves elements to the repository.
The operation of a user saving elements to the repository may be broken down into three main steps: setup and installation; finding the meaningful elements; and extracting the HTML for the meaningful elements found and returning it to the server.
The set up and installation requires customisation of the browser context menu and installation on a user machine.
The finding of the meaningful elements can be subdivided into the steps of: using the context menu as an interface with the users mouse over a node of interest; identifying a node supplied by the context menu ; traversing the tree to look for collections of meaningful elements; finding related nodes if a given node requires a related node; and creating meaning where there is none.
The HTML extraction and return to the server can be subdivided into the steps of extracting the raw-HTML or DOM sub-tree from selected nodes; passing HTML data to a new window; selection by a user; and storage by the server.
These three main steps will now be described in turn.
SET UP AND INSTALLATION To enable the customisation of the browser context menu, the following operations are performed: In Internet Explorer the user adds a new key in the windows registry under HKEY CURRENT USER\Software\Microsoft\Internet Explorer\MenuExt\"My Menu Text" Where"My Menu Text"is the text required for the new context menu entry.
The default value of the key is set to the URL of the page containing the script the developer wishes to execute if the user selects this menu entry.
The menu entry can be restricted only to appear in certain circumstances, for example only if the mouse is over an image. This is achieved by creating a binary value called Contexts under the key and setting its value accordingly.
In NN6, a new XUL overlay file, for example, navigatorCopynOverlay. xul is created which defines a new menu item as part of the context popup menu which can be referenced by setting the id of the < popup > element appropriately, namely < popup id="context" > . An'oncommand' value is attached to the menu item with the name of the
script function to be called and the application is told where it can find the script via a < html: script > tag.
Finally, the new overlay file is included in the global overlay file, in this case navigatorOverlay. xul, by adding the following line : < ? xul-overlay href="chrome : [path]/navigatorCopynOverlay. xul? > Optionally, submenu items can be added to the NN6 context menu and their appearance made conditional on the type of node which the mouse pointer was over when the context menu was activated.
Installation is relatively simple.
For example, to extend the IE browser a small registry file can be created which the user opens from the system web site. Doing so, having given the appropriate permission, will add the key to the users registry.
Installing the extensions in NN6 can be achieved by presenting the user with a suitable signed script. A signed script is a normal script that has a digital signature that confirms the authenticity of the script. A signed script can request special privileges, not usually available to a browser script, such as the ability to modify the browser or access files on the user's system.
If the user gives the script the appropriate permission, the modifications described above can be installed.
The step of finding the meaningful elements, and the various sub-steps will be described with reference to Figure 7.
To select an element to be added to the repositories, the user moves her mouse to that element and then activates
the context menu over the item of interest. This is shown at step 100. Thus, the context menu is used as an interface with the user's mouse over the node of interest.
The user can now select the add element option (34 in Fig.
2) to add an element to the repository. At step 102, a handle to the Node is returned to the script from the DOM over which the mouse was when the context menu appears.
In IE this Node can be accessed from 'parentwin. event. srcElement' and in NN6 from 'document. popupNode'. These are both the same type in the DOM, an HTML Node. This Node will be referred to as 'myNode'for the purposes of the following.
Identification of Node supplied by Context Menu At step 104, the script identifies the type of myNode (via myNode. nodeType). The options of interest in the HTML implementation are typically types 1 and 3. Type 1 is an ELEMENT NODE which means that the node received is an HTML
Element, and Type 3, which is a TEXT-NODE. Text nodes hold all the text data outside the HTML' < 'and' > 'tag brackets. Often text nodes are nothing more than the carriage returns between two lines in an HTML file but more interestingly this is where the text shown on the screen can be obtained from the DOM. In the DOM-- representation of Figure 6 a large number of TEXT NODES consisting of carriage returns and white space were omitted for simplicity.
Element nodes can be further distinguished by their tagNames, as can be seen from Figure 6. Different useful data can be obtained from each tag type. For example the source of an image file can be obtained from the'SRC' attribute of an < IMG > tag or the row and column data from the childNodes of a < TABLE > tag.
At step 106, myNode is examined to determine whether it is a meaningful element according to the defined rules. If it
is, at step 108 the element is added to the list of meaningful elements.
The script now traverses up and down the Node tree, looking for meaningful collections of elements by looking for meaningful ancestors and descendants. For example from a link ( < A > ) the script looks at all the childNodes, and their childNodes and so on to search for text nodes or image tags that form part of the link. The script then looks up at the parentNode, and its parentNode etc. until it reaches the document < BODY > which is the highest level node that could be of interest in this context, noting on the way if the link is part of a < TABLE > , < FORM > , < DIV > , < SPAN > node etc. , each of which could represent the common ancestor of a meaningfui collection of elements.
In Figure 7, at step 110 the process first looks for childNodes. If there are, the handle of each childNode is in turn passed to the script at step 112 and steps 102 to 110 are repeated for each childNode in turn. The process at step 114 then looks to see whether the parentElement of the current element is the BODY element. If it is not, at step 116, the handle of the parent element is passed to the script and steps 102 to 114 are repeated. If the answer at step 114 is yes, the process asks whether it is policy to capture BODY elements at step 118. If yes, the BODY element is added to the list of meaningful elements at step 120. In any event, the script is now ended at step 122.
Looking at this process in more detail, and referring to Figure 8, consider the example HTML page and the DOM at Figure 6. If the user activates the context menu over the image or text in the top right hand cell of the table, myNode will refer to the Node second from the left in the penultimate row of the diagram shade node 130. This is an Element Node representing an anchor tag (' < A > ') and its
descendants represent a meaningful collection element so this node must be noted. The Node tree is now traversed looking for meaningful descendants and ancestors.
First, the childNodes of myNode are located at and 2 Nodes 132,334 are obtained, shown shaded in Figure 8. These nodes are Element Node 132 for an < IMG > , another meaningful element to be noted, and a Text Node 134 stating that'Apricots are tasty'which is another meaningful element, despite the fact that technically this Node is not an element. The manner is which this type of Node is dealt with will be discussed later. Again, this element is noted. Three meaningful elements are now captured.
The search is then reversed and the parentNode 136 of myNode looked at. This is an Element Node for a Table Data (' < TD > ') tag representing a single cell in our table.
For the time being this is considered not to be a meaningful element as will be discussed. This Node's parentNode 138 is then examined to obtain an Element Node 138 for a Table Row (' < TR > ') tag. Again this is not considered to be a meaningful element.
The next parentNode 140 is examined to obtain an-Element Node for the Table (' < TABLE > ') tag that represents the whole of our 2x2 table. This represents a meaningful collection of elements, the whole table, and is noted.
The parentNode of the TABLE is the BODY 142 of the whole document which again represents a meaningful collection of elements and also a stopping point for our Node traversal. capturing the body of the page as represented by the BODY element is different to. bookmarking the location of the page. For example, the first page of a newspaper will change from day to day and so a user who wishes to capture the front page on a special occasion will actually need to
capture the body of the document as opposed to the URL of the page.
In practice this Element and its descendants may not be captured as the amount of data involved may be quite large. If it is decided to capture it then it cannot be saved'as-is'and its content must be put into a < DIV > Element which can be stored and retrieved from the database and displayed within the confines of another document. The manner in which a node is handled will again be discussed later. DIV and SPAN elements can be used to create freely positional"sub-pages". The content in a DIV or SPAN element can be set to move with its parent Element, hidden or made visible and even occasionally resized in proportion to the DIV or SPAN element.
A rule set is used to determine and identify'meaningful' Nodes, the decisions used for when to stop searching up or down and special treatment of Nodes, such as for the Body Element above. This rule set is based on the DTD for HTML with as little overruling as possible-this means that keeping the system up to date is more straightforward as the specification of HTML changes, and also provides an approach to generalising the technique described to other markup languages that come with their own DTDs.
For some types of nodes the script must also find associated or related nodes or data. A second set of rules is used to facilitate this. For example if a user activates the context menu over an image map (' < MAP > ') the script must find the image that uses the map; the collection of images in the document can be obtained from the array of image Nodes held in'document. images'within the DOM. MAP elements can also be applied to OBJECT and INPUT elements. These must also be searched to find the appropriate element to be matched to the MAP. It is then a simple matter to scan through these to find the images,
objects and inputs using an image map and in particular the one using the image map on which the mouse was placed. In another situation style sheets/style definitions may be needed to interpret the class attributes of nodes. This may be done in one of two ways: the script could locate and load the appropriate style sheets and cssRules or the script could record the non-default style settings of the node itself. It is preferred to extract the style information of each node independently but this is not essential.
Alternatively, global style settings can be captured by a straightforward DOM function call.
In some cases, non-meaningful elements need special treatment to make them meaningful.
Earlier it was stated that' < TD > 'and' < TR > 'tags did not represent meaningful collections of elements. In isolation they do not-without a' < TABLE > 'tag- represent well formed HTML. To the user, however, it is appealing to select rows from tables or groups of adjacent cells. It is made possible to select combinations of nodes which share a common ancestor node type.-For example, table data or table rows can be lifted from the table. In this situation the script would create a new ancestor of the appropriate type, possibly using the formatting attributes of the actual table from which they are being selectively extracted. A third set of rules is used to facilitate this which will be referred to later.
A list of the meaningful Elements and common ancestors of meaningful collections of Elements has now been obtained.
The third stage of the process is to extract the HTML for these meaningful Elements and Returning it to the server.
Having drawn up a list of meaningful Elements, or collections of Elements, the script now extracts the required data from the DOM for each of them in turn. This data will then be passed to a new window before being sent to the server. This process is illustrated in Figure 9.
There is a choice between extracting the raw-HTML, or the DOM sub-tree from Selected Nodes.
The HTML represented by the Elements and their descendants can be recreated or copies of the relevant sub-trees of the DOM itself copied. The choice in practice depends on the performance of the different browsers at the extraction of the data or copying the DOM subtrees.
If the implied raw HTML is created, a number of techniques may be used. It must be noted that this HTML may have been created by a script on the publishers web site and may not represent the actual HTML passed from the web site's server. Alternative approaches will be described later.
Referring back to Figure 8 and commencing at node 130 which relates to a link containing an image and the text
'Apricots are tasty'. The whole of the process must be repeated for each meaningful Element in the list. Referring to Figure 9, a blank string"myHTML"is created at step 150. At step 152 a check is made whether the element is of the type ELEMENT-NODE. If not, a check is made at step 154 to determine whether the element is of
the type TEXT-NODE. If, at the step 152 the element is determined to be an ELEMENT~NODE, at step 156 the opening tag (" < A", in the example being considered) from the tagName of the Node (myNode) is added and a list of the attributes checked for the Element from 'myNode. attributes' and for any that have non-blank values add them to the myHTML string. In the example, myHTML now reads" < A href='/test. html'". The same exercise is
repeated for any style settings that have non-default values by scanning through the'myNode. style' array. In the example there are no style settings so myHTML is unchanged. The opening tag (myHTML=" < A href='/test. html' > ") is then closed. Thus, in Figure 9 step 156 is executed in the order of the opening HTML < and name tag, non-blank attributes, non-blank style settings and finally the closing angle bracket > . In IE the list of attributes is very long and goes well beyond the list of attributes specified in DOM2. The list is thus restricted to the list of attributes applicable to each Element type - this can be obtained from the DTD. For the sake of efficiency the search through the style setting may be restricted to the core values relating to size, position and colours.
We now recursively repeat the exercise for each childNode, and in turn for each of'their childNodes-including non meaningful Elements-and their childNodes etc. This is shown at step 158 in Figure 9 at which it is determined whether there are any childNodes. If there are, at step 160, the handle of each childNode is passed in turn to the script and the process is repeated recursively for each childNode. The result is then appended to my HTML.
Referring to the Figure 8 example, the first node encountered is the IMG element. Repeating the above exercise of extracting attributes and styles, myHTML=" < IMG src='/image/etfront' > "is created. This node has no childNodes and so a check is made to see if a end-tag is appropriate for this type of Element. In this case it is not, as, according to the DTD for HTML, < IMG > elements do not have end-tags so the local myHTML is returned back to the parent node. For the link node, myHTML now reads =" < A href='/test. html' > < IMG src='/image/etfront' > ". In Figure 9, the step of looking for an end tag is shown at step 162. If present, the end tag is applied to myHTML at step
164. If not present, or after application of the endtag, the finished script is returned to myHTML at step 166. The next childNode of the link is a text Node from which is extracted the nodeValue which is returned to the
parentNode. For the example link node, myHTML now reads =" < A href='/test. html' > < IMG src='/image/etfront' > Apricots are tasty". There are no more childNodes so an end-tag is added to myHTML, if appropriate for this type of Element, to get the final result of myHTML=" < A href='/test. html' > < IMG src='/image/etfront' > Apricots are tasty < /A > " The process is summarised by the following pseudo code.
Function extractHTML (myNode) { create empty string myHTML="" if (myNode is an Element Node (i. e. myNode. nodeType==l) ) do { myHTML = myHTML+" < "+myNode. tagName for each member of myNode. attributes do {
If specific attribute is non-default myHTML = myHTML +" [attribute name] == [attribute value]"or attribute name] for boolean attributes.
} if (any member of myNode. style is non-default) myHTML = myHTML +"STYLE='" for each member of myNode. style do { If specific style is non-default myHTML = myHTML +
"[style name] : [style value] ;" } if (any member of myNode. style is non-default) myHTML = myHTML +"" myHTML = myHTML+" > " if (number of childNodes (i. e. myNode. childNodes. length) > 0) do {
for each member of myNode. childNodes do { myHTML = myHTML + extractHTML (childNode of myNode) } } if the tagName of myNode requires closing tag
myHTML = myHTML+" < /"+myNode. tagName+" > " } else if (myNode is Text Node (i. e. myNode. nodeType==l)) do { myHTML = myHTML + myNode. nodeValue } return myHTML ; } This is represented by Figure 9.
This description has glossed over one essential task the script must perform on the extracted HTML (or DOM subtree) before it is passed to the new window. Many websites reference images and links etc. relative to a base URI, often the domain of the page being viewed. In the example the images SRC attribute looks like the following SRC='/image/ {filename}'-this reference is relative to the domain of the publisher's server. If the user attempted to display this image from the repository site he would not see the image as the repository will not have a copy of the image file. What the script therefore does is replace SRC='/image/ {filename}' with SRC='http ://fdomain-namel/image/ffilenamel'. This is easily done as the DOM subtree is traversed. Each time
an attribute is found that may need changing, such as 'SRC'for < IMG > ,'HREF'for < A > , a few string operations are performed that convert the relative URI to an absolute URI. A full list of attributes whose values are URI's can be obtained from the DTD. The process that must be executed to convert relative to absolute URI's must
satisfy the following Request for Comment rfc 1808 which can be found at www. ietf. orq/rfc/rfcl808. txt. If the base URI in this example was'www. domain. com' the final HTML to be captured would then read myHTML= " < A href='http : //www. domain. com/test. html' > < IMG src=http ://'www. domain. com/image/etfront' > Apricots are tasty < /A > " Instead of
myHTML=" < A href='/test. html' > < IMG src='/image/etfront' > Apricots are tasty < /A > " There is now a list of meaningful elements or the common ancestor that makes a collection of Elements meaningful, together with the HTML that represents each of them (and their descendents) in the DOM.
Capturing the Javascript associated with an"HREF"or "event"is theoretically possible but may cause unpredictable behaviour. The scripts in a page can be obtained from an array of script elements from the DOM.
This array could be recreated in the HTML being saved, thereby ensuring that the script attached to the"HREF"or "event"is available when the repository displays the saved element. Variable and function names in these scripts may clash with names from other sites and may well refer to elements on the original web site that are no longer available once the element has been saved out of context. The ability to save the scripts associated with element attributes (including mouse and keyboard events) may therefore be disabled.
The HTML data is then passed to a new window (or a new layer on the same page). The script, having identified the
Nodes representing the common ancestor of each meaningful collection of elements, or having created a virtual ancestor where such a node does not exist, takes the HTML represented by each Node and its descendants and passes it as an array of data to a new window it creates. The HTML passed to the new window is written into a series of layers, or' < DIV > 'elements all of which are hidden from view apart from the default option, which is the HTML corresponding to the actual element over which the context menu was activated.
In its simplest manifestation the layers are created by the following type of script (in pseudo code): for (i=l to number of meaningful elements) do { write the following HTML to our new window " < DIV ID='myLayer[i]'STYLE='visibility : hidden' >
myHTMLArray [i] < /DIV > " } If our default option was element no. 2 (for example) we would then modify the style as follows : document. getElementByld ('myLayer2'). style. visibility='visi ble' The User Then Makes His Selection. On this new window is a FORM, with a pulldown menu of options, a < SELECT > tag, corresponding to each of the meaningful collection of elements passed from the main window. As the user chooses different options from the menu the corresponding layer is made visible and the others hidden. This is done by switching the style visibility setting of the DIV to 'visible'and'hidden'accordingly.
This is illustrated in Figure 10 which shows a screen shot of a Window 200 in which the selected area to be saved 202
is displayed. The user selects from a drop down menu 204 what he or she wants to save, for example the entire table, an image or a link and clicks the"add to Copyn" button 206 to save the selection to the repository. A reset button 208 is provided to enable a selection to be cancelled.
When the user has finalised his choice {in our example between the text, Apricots are tasty', the image'Love a Book', the link, which includes the text, the image and a target for the link, and the whole 2x2 table} he clicks on a button to'post'the results from the form to a web server program (for example a cgi script written in Perl) running on the repository server. Posting is one of the methods of returning data to the server from an HTML form.
Until now there has been no interaction with the server.
Only the selected HTML is passed, together with other useful pieces of information such as the URL of the page from which it was obtained, the size of any image files (only possible in IE at present) etc. The exact choice of data to be returned will depend on customer demand but this data is generally obtained by a limited number of methods including the following: Extracting HTML for selected elements on the page; the Height and Width of the element as currently rendered by the browser (this is obtained from the offsetHeight and offsetWidth fields) which is useful for determining the size of the element for display on the repository; Obtaining browser or system data from data made available from the DOM (e. g. type of browser or operating system) ; Information about the web site and domain (such. as the URL of the page) ; and Date and Time data.
The server then stores the data as follows.
The server script first checks for a'username'cookie. If it does not find one the user is invited to log-in or
register. The user details are confirmed with, or stored in, a database table on the server. This use of cookies for identifying users and validation of passwords etc. is common practice online and will not be described any further.
Once the user has been validated, the server script takes the data provided by the form and adds it to the user's repository. An SQL query may be made to ensure that the data is not a repeat of content already in the users repository.
The data is stored in the'default'category determined by the user's predefined preferences.
Once all this has been done, the content of the'new window'is replaced with a message from the server. A confirmation message, showing what has been saved, is displayed in the new window. After a short preset period of time, for example 5 seconds, the new window closes itself.
The HTML representing the user's selected generic Element has now been passed to his repository for subsequent retrieval.
Database Representation The following representation of the database and its associated tables and data allows the embodiment to be recreated but may not necessarily the most efficient implementation which could be developed. Sufficient information about the requirements is, however,'provided to allow a more sophisticated database to be developed.
The information set out below relates only to the implementation of the embodiment and not to other data and
services that may be useful from a commercial point of view. For example, in a commercial implementation we may seek further user data beyond the Name and Password (e. g. e-mail address etc. ). Implementation of such additional features is straightforward for one of ordinary skill in the art.
The core data will be split into 9 data tables (more tables may be added later depending on business requirements). Taking each data table in turn, the purpose of each table and the primary fields required is as follows: User Data Table This captures information about each user and basic preference data such as their default group and default repository.
User Name Name by which user identifies himself User Password Password user selects to control access to account Default Group = Groups Default collaborative Group to which user Data {Unique Id.} belongs (may be blank) Default Repository = Default repository of saved elements-each Repository Data {Unique user has one or more repositories.
Id.} Unique Id. System generated identifier for User.
User Data Table
Element Data Table This is the core data saved by the client interface described. It holds the HTML, domain details etc. but nothing about how this data is to be displayed on the repository interface.
Raw HTML HTML extracted and saved by Client Interface Source Domain Name Domain name of site from which the raw HTML was taken Source Page URL URL of the web site from which the raw HTML was taken Date/Time Created Date/Time the element was saved Date/Time last visited Date/Time the element was last clicked on (if a link) Owner Repository = Repository within which the element is saved.
Repository Data {Unique Elements can exist within more than one View Id.} for the same repository.
Copy of Me = Element Data Location of a copy of the element (created if {Unique Id.} user sends copy of part of repository to another user for example). This copy of the Element may need updating if the underlying element of its own, etc.
Unique Id. System generated identifier for Element.
Element Data Table Card Data Table The information in this table captures information about the display, formatting and position of the Element Data. The card has information about which leaf it is displayed on. Any given Element can be associated with several different Cards.
Associated this Element Data {Unique Id.} card Position/Size etc. Examples of customisation options specific to each card such as location on screen (within the leaf) Background Colour etc. Examples of customisation options that can be common to many cards-these can be overwritten by, or inherited from, Owner Leaf.
Comment/Description Text Examples of text fields the user can add, or etc. modify to describe or comment on the card/element.
Owner Leaf = Leaf Data Identifier for the Leaf of which this card is {Unique Id.} a part.
Date/Time last visited Date/Time the element was last clicked on (if a link)-specific to this card.
Unique Id. System generated identifier for Card.
Card Data Table Leaf Data Table The User's screen, in a given view, is split into a number of Leaves navigable by tabs, similar to a spreadsheet in MS Excel and other products. Each Leaf holds information about its own display as well as default values for any Cards placed in it. In essence Leaves can be used to categorise and classify Cards and hence Elements.
Owner View = View Data The View of which this Leaf is a part.
{Unique Id.} Leaf Title This a descriptive title used for tab label.
Reference to View = View In order to accommodate sub-Leaves a leaf can Data {Unique ID/} include a pointer to a View-this View and its Leaves will appear within this Leaf (see Figure 4 for illustration).
Background colour, text Customisation options for the leaf that drive font, border type etc. its display. Some settings may be inherited from default values at View level.
Background colour, text Default settings for customisation settings font, border type etc. for cards that appear within it.
Unique Id. System generated identifier for Leaf Leaf Data Table View Data Table A View is made up of a collection of Leaves and hence cards and in turn Elements. Overall View settings can easily be copied from one Repository to another.
view Name (descriptive) This a descriptive title used by the User to identify the view (we may also have a short form title for use on menu options) Owner Repository = This is the Repository to which this view Repository Data {Unique applies. nid.} Overall customisation Some customisation data exists at View'leveldata (e. g. page size, this includes location of waste-bin, position type of waste-bin etc. ) of leaf tabs, default values for Leaf settings.
Unique Id. System generated identifier for View View Data Table Repository Data Table Each user or collaborative Group of Users has one or more repositories of data. The identification and administrative data is held in this table together with the default View associated with the Repository
Owner User/Group= User Each Repository has an owner/administrator Data {Unique Id.}, Group responsible for it. This can be a single User Data {Unique Id.} or a Group.
Default View = View Data Each Repository has a default View.
{Unique Id.} Unique Id. System generated identifier for Repository Repository Data Table Groups Data Table Users can belong to collaborative Groups that can access shared repositories-this captures information identifying the Group and its default Repository.
Universal groups allow users to make their Repositories/Views available to everyone, e. g. for public read access.
Group Name (descriptive) Descriptive title for the Group (may also have shorter version for menu labels) Owner User = User Data {Administrator/Owner for the Group-this User Unique Id.} is responsible for Repositories (and hence Views etc. ) owned by the Group.
Default Repository = Default Repository for the Group Repository Data {Unique Id.} Unique Id. System generated identifier for Group Groups Data Table UserGroup Data Table This table maps Users to Groups. It is used to determine which Users are members of which Groups.
User = User Data (Unique Name of'User belonging to Group Id.} Group = Groups Data Group identifier {Unique Id.} Unique Id. System generated identifier for UserGroup linkage UserGroup Data Table Permissions Data Table This table is used to restrict and manage access privilege to various data in other tables. For example it can be used to limit access to a Repository or view.
Associated Data = {Unique Unique Id. From any of the following above Id.} data tables: Element, Card, Leaf, View, Repository, or Group.
Associated Table = Table Name of data table to the which the above Name/Data Type identifier refers.
Recipient User/Group User or Group to which this permission relates {Unique Id.} Grantor = {Unique Id.} User or Group that owns this permission.
Type of Permission Whether the permission relates to ability to read, modify, create, delete, administer etc.
Unique Id. System generated identifier for Permission Permissions Data Table The Permissions data table is very important. The data can be used as follows: A Group owner may grant the right to administer Group membership to another User. In this case the Group owner is the Permission Grantor, the second member is the Recipient User, the Type of Permission is administration, the Associated Data Table is the Group data table and Associated Data is the Group to which the second user is being given the permission.
A User may grant universal read access to a specific View of a specific Repository. In this case the Permission is set for the View-the Grantor is the User, the Type of Permission is read access, the Recipient Group is the Universal Group and the Associated Data is the View. A Permission of the Repository is created with the same settings. The repository cannot be'looked'at other than
via a View and so granting this Repository Permission does not allow access to other views.
A Group may choose to organise itself with each User having full access to one Leaf each and read access to all the other Leaves. This can easily be achieved by setting the appropriate permissions on each Leaf.
The database also stores a copy of the various DTDs used to define the syntax of HTML markup constructs. These will be the first of many DTDs to be captured in the database and will form the dataset from which the rulesets, required to capture and display broader XML elements, can be developed and recorded.
The database used may be a standard SQL database or other type of relational database, which the web-server accesses via Perl/CGI, or another interface mechanism between the web server and the database.
This data structure set out above allows groups, views, leaves, cards, permissions etc. to be customised.
The repository user interface will now be described in greater detail.
There are two aspects to the Repository User Interface, ("RUI") the representation of the data in a relational database as described and the Free-form visual user interface, which is one implementation described.
Before describing the mechanics of how the visual interface works it is useful to give a brief description of how the database structure ties in to the practical use of the system: "Users"can belong any number of collaborative"Groups" (including none).
The administrator of a group manages the repository access privilege of group members and the administrator can also allow universal read access to a repository.
Users and Groups can have one or more Repositories.
Repositories can have more than one View. The user can switch views at any time by choosing the desired view from a drop down menu.
Views are constructed of a customisable set of Leaves. The number of Leaves can vary, as well as their layout on the screen. In the default layout, the Leaves overlap each other with non-overlapping tabs at the top to allow the user to switch from leaf to leaf. Leaves can have different background colours or images. Leaves provide default customisation parameters to the Cards displayed on them. A Leaf tab can point to a View to be displayed completely within the Leaf to form a type of sub-Leaf.
This allows the type of multi-level leaf structure illustrated in Figure 4.
Leaves display a number of customisable Cards. Each card can be customised or can inherit its settings from the default values stored at Leaf level. customisation includes background colour, including transparent or even a background image, border type, whether a comment field should be displayed etc. Each card displays one Element and can have comments/descriptions attached, which can include hyperlinks added by the user. Cards can display information about the page from which the Element was stored, date of last access etc. The card can be repositioned on the screen and resized by dragging the mouse. The card can be moved (or copied) to another Leaf by dragging it onto the'new leaf tab. The card can be removed from the view entirely by dropping it onto the waste bin icon. Changes in customisation settings are returned to the server so that the View is kept up to date.
Each Element represents the ancestor Node of a meaningful collection of Elements stored from a web-site via the Client Interface described earlier. This is rendered by the users web-browser to appear within the card with the customisation set as required by the user.
The previous description described the data structure underlying the embodiment in some detail. This section sets out how this is tied in with the user interface.
Rather than describing the interface sequentially, as was done for the Client Interface, this section will describe how all the key functionality is achieved.
Overall Structure of the Repository Interface.
The user accesses a repository by opening their home-page on the server. This site can also be launched by using an extension to the browser context menu, as described earlier.
The data sent to the user's web browser from the respository server consists of 3 main groups: 1. Javascript Code (browser side script) A fairly substantial piece of Javascript will be delivered to the web browser. This would typically be cached automatically by the user's machine and so there will be very limited performance overhead. Much of the customisation data specific to the Repository/View combination being viewed will be passed to the script as parameters which the script uses to build the page being viewed, customised for the situation.
The way that the script works and how it obtains, processes and updates the customisation data will described in some depth later.
2. Database dependent HTML generated by a CGI/Perl script (server side script).
It is preferred to implement the web-server scripting and database access using CGI/Perl but this is not the only choice available. The way that this code works for the significant parts of the process will be described in some detail later. The process will be similar regardless of language choice on the server.
3. static HTML. Very little of the RUI is static HTML.
Most of it is customised for the specific user/repository/view-either by the web-server or by Javascript.
Obtained Data.
User Details The repository site reads a cookie, containing a username and encrypted password combination, specific to the repository server's domain when the user first requests access to the repository. This is checked against the values stored in the User data table, using a simple SQL query. If there is no cookie stored or the username/password combination is invalid the user is requested to try again or to register to the service.
This whole mechanism is'commonplace on the Internet and so will not be described in more detail.
Default settings Once the user has been validated access can be had to all their preference data from the User data table. This includes their default Repository and Group-this data is used to determine the initial data/display they see on the RUI (i. e. their repository home page).
The default Repository is looked up in the Repository data table. This then provides the server based script with the default View, with its customisation data. This in turn is used to find all the Leaves included in this View, with their customisation data. These in turn give the cards with customisation data and finally the Elements themselves. This data is obtained by a number of database queries.
A significant block of HTML data; customisation settings pertaining to the User's default Repository and its default settings have now been extracted from the database.
There now follows a description of how the data from the database is delivered to the browser script.
There are a number of ways in which this can be achieved but they involve the same basic principal. The following describes a specific solution utilising the IFRAME element, the HTML code element for creating floating frames.
The browser side script creates a hidden IFRAME element on the page, it is hidden by setting its style parameter accordingly, which receives the data from the server script by setting the IFRAME's SRC attribute to call a server side script.
The following type of command would achieve this: document. writeln (" < IFRAME NAME='hdnl' SRC='/perl/myData. cgi'
STYLE='visibility : hidden' > < /IFRAME > ") ; During the construction phase of the web page this allows the server-side script myData. cgi' to be executed. This server side script in turn creates a new browser side script, within the hidden IFRAME, containing the customisation data we require. This is done by making the database queries mentioned in the previous section, and writing the results out into a series of arrays.
These arrays allow the data to reflect the hierarchy of items to be displayed. Each piece of element data is stored within a card data array, together with customisation data. The data for a group of cards is held in a leaf data array, the leaf data is held within a view array.
Once the script (myData. cgi in this case) has finished executing and the results fully loaded into the IFRAME, this data is available to the main browser script that is controlling the creation of the page. The content of the IFRAME can be accessed via: document. frames. hdnl. arrayvariablename etc.
Using the customisation data from the database.
The overall structure of the page is determined, either by HTML received from the server or by the script. This process is very commonplace and will not be described here. At this stage there is a fairly content free page, perhaps displaying a logo, copyright and terms and conditions statement etc.
Once the customisation data has been loaded from the server the controlling script proceeds to create the remainder of the web-page. The overall customisation data is used to add a little more detail to the page for example the choice of wastebin image and by changing the default colour scheme. This is done by modifying the style settings of items that already exist within the DOM and inserting new items, such as the wastebin (the wastebin is added in much the same way as Leaves and Cards which are described below).
The required number of Leaves is added, the visibility
setting of the default Leaf being set to'visible'and the others to'hidden'. On each Leaf the Cards are drawn.
Leaf construction and manipulation Leaves will be added and deleted by the user after the page has finished loading. Therefore, when first inserting the leaves into the document, the same mechanism can be used. The DOM2 provides a standard way for doing this, and the two browsers (IE5+ and NN6+) provide a convenient, but non-standard, mechanism for inserting it into the document. These methods themselves do not form part of the DOM2 specifications but are more efficient than the DOM2 methodology.
In both cases a blank string (myHTML, say) is created. The script loops over the number of Leaves, incrementally adding HTML as text to myHTML. For each Leaf we do something like the following:
myHTML=myHTML+" < DIV ID='Leafn'STYLE='leafstylen' < /DIV > " Where Leafn is an identifier for Leaf number'n'and leafstylen incorporates the customised display settings for the Leaf, making sure that the Leaf Style takes note of which Leaf is to be displayed initially.
For NN6 now take myHTML and create a DocumentFragment (a free standing DOM subtree) from it using the createContextualFragment method of the Range Element and insert it as a new child of the BODY element using the appendChild method. Note that the same result could be achieved by creating the Element and its attributes one at a time by using DOM2 compliant methods. Whilst this is a purer approach it is far less efficient.
For IE5 take myHTML and use the insertAdjacentHTML method of the Body Element to insert the HTML before the end of the Element.
Small'tabs'are created to appear at the top of each layer. These are created using the same layer technology as the Leaves themselves with the DIV elements structured to be appropriately dimensioned and placed just above the Leaves themselves. On each DIV element is placed a text based link. The text of the link is the Leaf Title, from the customisation data, and the HREF attribute is set to run a simple javascript function that switches the Leaf being displayed to the one corresponding to the tab being clicked on by the mouse. It is possible to use a mouse event to trigger the leaf switch in place of the HREF approach for more refined handling. The script merely switches the visibility style flag on each Leaf layer to achieve this. Additionally when a user selects-a tab its background colour is changed (using its style setting again) to highlight the active Leaf title.
Sub-leaves can be created within the layer representing the leaf, with tabs appearing at the top of the sub-Leaf, immediately below the tabs for the main Leaves themselves.
This is achieved by using a Leaf Tab as a pointer to another View which is then created within the Leaf (as opposed to within the BODY of the document). In the above description of creating a Leaf the appendChild (or insertAdjacentHTML) method is applied to the Leafn element instead of the BODY element.
At any point the user can insert a new Leaf by running a script function, which can be attached to a button, a main menu item or the context menu. This script creates a new empty leaf using the same technique as described for creating the other Leaves. In this case there is no data to be obtained from the database so the new leaf settings are set to the default levels for the View until they are overwritten by the user.
The overall page structure is now set up and the Leaves are displayed. But they have no content.
*Card construction.
Cards are constructed in a similar way to the Leaves. In this case, however, the card is a more complex item to construct.
A card has a few core parts: The containing layer, which is the containing outer boundary of the card; the element layer, a sub layer of the containing layer that contains the Element stored in the database ; the comment layer, a sub layer of the containing layer that contains any comments and additional text fields related to the Element stored in th database ; and the resizing layer, a sub layer of the containing layer that provides a box that the mouse pointer can click on to resize the containing layer and with it the element and comment sub-layers.
These layers are called cardLayern, cardSubLayern, cardCmtLayern, cardRszLayern in the following description, where n refers to the card number and is unique within the View. In other words the numbering system does not restart with each Leaf. The customisation settings, passed from the database via the IFRAME element, are captured as STYLE settings associated with each layer that makes up the card (cardLayerStylen=cardLayern. style, cardSubLayerStylen, cardCmtLayerStylen, cardRszLayerStylen).
For each card, a piece of HTML (say'myHTML') is constructed along the following lines: myHTML=myHTML +" < DIV ID='cardLayern'STYLE='cardLayerStylen' > +" < DIV ID='cardSubLayern'
STYLE='cardSubLayerStylen' > "+myElementData+" < /DIV > " +" < DIV ID='cardCmtLayern' STYLE=='cardCmtLayerStylen' > "+myCommentData+" < /DIV > " +" < DIV ID='cardRsvLayern' STYLE='cardRszLayerStylen' < /DIV > " +" < /DIV > " Where myElementData is the raw HTML captured by the user and obtained from the database and mycommentData contains the comments and descriptors that the user has opted to display.
This piece of HTML is then inserted into the appropriate Leaf Layer (as opposed to the BODY Element).
Since the creation of the cards will cause their associated Elements to be loaded from their relevant third party servers (as determined by the SRC attributes of images etc. ) the order in which they are loaded needs to be controlled. The script staggers the creation of cards on all but the default leaf, in order to allow time for the cards on the default leaf to be loaded. This delay is overruled if the user switches the display to another Leaf. This extra sophistication is built into the leaf switching script attached to each tab (as described in the previous section). A flag is checked to see if the cards on the new Leaf had been created, if not, then the cards are created immediately.
The position style setting of each layer is set to 'absolute'and then to define the dimensions as percentages of the containing layer (cardLayern). This means that the layers will all move and resize together.
Control of Card Content.
Stored elements and meaningful collections of elements are being displayed out of the context in which they were created and they may not be displayed the intended way.
Some elements provide their dimensions as a matter of course, as is the case for most images for example or where the original web publisher required for a specific layout. In addition, the actual height and width of the element as displayed on the screen was captured when the user saved the element originally.
This information is used to determine the size and shape of the element, as it should appear in its card, and clip the region to ensure that the elements do not spill out over the edge of the containing layers. This can be done setting the clip style setting for the cardSubLayer.
For some Elements, in particular images-with e-r without associated link, the dimensions of the Element can be set to resize with the dimensions of the cardSubLayer. This is done by setting their position style to'absolute'and fixing their width and height to fixed percentages of the cardSubLayer. This has the effect of causing the image to change shape as the user changes the shape of its container. This will be possible for other select Elements. For other Elements if the cardSubLayer gets too small to contain the Element then the content will be clipped or scroll bars will appear (depending on the Element type). The scroll bars appear if the overflow style setting of the cardSubLayer is set to'auto'. Moving and Resizing Cards, moving cards to another Leaf or dropping in the Wastebin.
With both IE5 and NN6 browsers mouse events can be attached to various elements, including the DIV elements from which the card is built.
The mouse events of interest are: onmousedown ; onmousemove; and onmouseup.
Many articles have been written about moving items on web displays using the mouse and so a broad overview only of one way of doing this is given Further information may be found at http://developer. netscape. com/viewsource/goodmandrag/good mandrag. html onmousedown Once the cards have been created the onmousedown method of each cardLayern is assigned to a script function ('engageLayer'). This function now'listens'for this event being triggered by the user's mouse interacting with this element on the screen. This function will be called when the user presses down a mouse button on the portion of the layer not covered by other items and not--if the mouse button is not pressed down. When it is called this
function sets a global variable ('selectedLayer') equal to the element returned by the event (NN6=evt. target, and IE=window. event. srcElement), records the (x, y) coordinates of the mouse when is was pressed down and sets the onmousemove method of the document equal to a script function ('moveLayer'). onmousemove The first thing the script does is test to see if iselectedlayerl has been set-assuming it has, it now resets the location parameters for the cardLayern by adding in the change in the (x, y) co-ordinates of the mouse since the mouse last moved (or was first pressed down). Finally the recorded (x, y) co-ordinates of the mouse are updated. The browser causes this method to be triggered discretely but this happens frequently enough that the movement of the Card on the screen appears smooth to the user. onmouseup The onmouseup method of the document is set to a script function ('disengage') from the moment the layer is first created. The first thing the script does when called is
test to see if'selectedLayer'has been set-assuming it has it now sets selectedLayer to null and unsets the onmousemove method of the document. This gives the user the impression that the card has been'let go'.
To improve the user's experience when moving cards on the screen the following steps are performed:
The background colour of the cardLayer changes when it is 'engaged'. The whole cardLayer can also be made for transport for moving.
The background colour changes back when is it- 'disengaged'.
The z-index, which represents ranking of card images above each other, is set to a high value when the Card is engaged. This means that the Card appears above the other Cards on the screen. This may be done by tracking the highest allocated z-index value and using a z-index value one greater than the highest used to date and update max z-index variable each time this new high-level is set.
When the user drags the Card off the edge of the screen there is a risk that the onmouseup method will be missed by the script and the Card continue to move around even though the mouse has been lifted. This is countered by tracking the edges of the browser window and forcing the 'disengage'function to be called each time the mouse crosses the edge of the window.
Re-sizing is done using the same principals as moving Cards on the screen. In this case however it is the cardRszlayern that listens for the onmousedown and the onmousemove events and the attached script function causes the cardLayern to be resized as opposed to moved. Again the same types of subtle improvements can be added (changing background colour etc.).
Dropping items on a tab or wastebin is accomplished by checking the mouse co-ordinates when the mouse button is released to see if it is within the boundaries of the wastebin or one of the Leaf Tabs. If it is over the wastebin it is deleted and if it is over a Leaf tab it is moved to the appropriate Leaf.
Updating/Modifying.
Changes may be submitted to the database incrementally (as cards are moved, dropped in the wastebin or moved to another Leaf etc. ) or at the end of a session when the user is asked if they wish to save their new settings.
The mechanics are the same in either case. A third approach combines those'two and allows the updates to be sent incrementally but not be committed to the database until the user confirms them.
If data is sent to the server incrementally, the user does not need to wait for a response from the server before continuing, this processing goes on in the background.
In either situation it is important to ensure that all the updated data has been returned to the server before the main window is closed otherwise some changes will be lost.
This can be guarded against by setting the onunload method for the BODY Element of the RUI main window to give the user the option to delay the close until the data has all been received by the server.
Two alternative processes will now be described that can be used to pass the updates back to the server (without disruptive messages on the user's screen).
1. Using a FORM GET type method on a hidden IFRAME element.
Forms use two methods of returning data to web-servers: The'post'method, which was used earlier by the Client Interface to pass the data to be saved to the server, and
the'get'method. This latter method is used here.
When used on a form the get method passes the parameters to be returned to the server as part of the URL-it may look something like: htttp://www. mydomain. com/cgi-bin/do-yourstuff? x=21 & apples=210 This is calling the script"do-your-stuff"and passing the parameters x=21 and apples=210.
This type of URL does not have to be created by a form.
If a hidden IFRAME element is created and its SRC attribute set equal to the URL of the server side script with the required parameters tagged onto the end following a ?', the server can read the parameters. Having used the cookie to confirm the identity of the user, the server side script can update their database entries accordingly.
2. Using Cookies to pass data back to the server.
Short lived cookies can pass data back to the server.
These are created with an expiry time of only a few seconds which is long enough to pass the data back to the server. This is achieved by calling the server script via a hidden IFRAME. Longer lived cookies can be used to hold data being transferred back to the server thereby reducing the risk of the user session being closed abruptly before the data has all been transferred. Each domain only has a limited number of cookies available and so longer lived cookies would need very careful management.
Cards dropped in the wastebin or moving Cards to another Leaf.
When a Card is dropped in the wastebin a message is sent to the server (either immediately or at the end of the session depending on how the system is configured) telling the database to delete this Card from the User's Leaf (and hence View). If the Element, contained in the Card being deleted, is not associated with any other Card it is also deleted from the database.
When a Card is moved to another Leaf, the database is updated to change the Card's Owner Leaf. Next time that View is loaded, the Card will appear in the new Leaf.
The script keeps its own record of which Leaf each card belongs to, based on when the data was first loaded and the changes the user has executed subsequently and so the data does not need to be refetched from the database when a new Leaf is displayed.
Uploading data from a user's browser based favorites/bookmark collection: In IE5 making a call, in a
script, to'window. external. ImportExportFavorites'allows the repository server to obtain a copy of the user's favorite collection. Microsoft choose to format this data in the format of Netscape's Bookmark file. In Netscape a signed script can easily be given the permission to obtain a copy the user's bookmark file.
In either case what is received at the server is a set of bookmarks in Netscape bookmark file format. This file is an HTML file setting out the bookmarks in an HTML definition list. This is a well structured file consisting largely of < A > type links with text descriptors, that can be easily parsed and uploaded into a basic set of text based elements and cards in a repository embodying the embodiment.
Having described the construction and operation of preferred embodiments of the embodiment some points will now be described in greater detail.
The definition of meaningful collection of elements is specific to HTML and in particular HTML as it is currently defined. Different rules would be used for a different Markup Language and also new rules or modifications to the following rules may be necessary if further additions or modifications are made to the specifications of HTML. It is to be understood that the present invention is not limited to HTML or to any particular mark-up language.
The rules, whilst hard-coded in the current implementation, could be derived from the HTML DTD referred to below. This type of approach would allow application to other visual XML/SGML type applications.
In some cases, the tagName is used as a shortcut to identify the Element e. g.' < BODY > 'instead of an'Element Node with a tagName ="BODY"'. In doing so it should be noted that the tag need not always appear in the raw HTML file for the associated Element to exist within the DOM.
1. Skeletal Elements-Used to Stop Node Traversal These are the tags that are used to stop the traversing up through the DOM Node tree. In broad terms they provide the skeleton of the document. If the script encounters either of the following of these it stops searching for a further parentNode : < BODY > < IFRAME > 2. Base Nodes of Meaningful Collections The HTML4 Strict Document Type Definition defines groups of elements know as Entities identifiable as % name. Those that come under the following definitions form common ancestors to meaningful collections of elements. Note that one or two elements are over-ruled in the list of excluded elements below: tfontstyle % phrase % special % block In addition the following Elements are considered meaningful: < BODY > special case, see below < FONT > Strictly speaking this should be ignored as a deprecated Element but it is still in very common use.
In practice, however, one or two of these may be excluded as they are not very meaningful. For example < BR > (within % special) is merely a forced line break or < HR > (within % block).
3. Special Cases Some elements receive special treatment in order to capture the appropriate information. Specifically: < MAP > , which is included within % special has no meaning without an associated < IMG > , < OBJECT > or < INPUT > -the script therefor searches for the appropriate'partner' element.
< BODY > . The content of a BODY Element will be displayed within a DIV Element in the repository so the content is placed within a new < DIV > element instead.
Text Nodes are not elements but a parent Element is created for them that allow them to be added to the repository.
4. Non-Meaningful Elements The following Elements are not considered meaningful and are passed over during all Node traversals, but they will be included (where possible) within the DOM subtree saved.
< DEL > , < INS > -these are used to track changes in documents.
Deprecated Elements such as < APPLET > , < CENTER > , < DIR > , < ISINDEX > , < MENU > , < S > , < STRIKE > , < U > .
Elements that only exist with the HEAD element such as < META > , < STYLE > .
< NOFRAMES > , < NOSCRIPT > . Technically these are meaningful elements but by their very nature will not be saved by the script in the latest browsers. The reason is that IE5 & NN6 support both FRAMES and SCRIPTS and so these alternate tags have no meaning in this context.
< HTML > , < HEAD > , < FRAMESET > , < FRAME > cannot be reached by the script.
Elements that exist exclusively within < TABLE > , < FORM > , < OBJECT > where not specifically allowed by other rules-this would include for example < TD > , < TBODY > or < SELECT > .
5. Excluded Elements It is chosen to exclude < SCRIPT > elements as their content can have unforeseen effects on the behaviour of the repository.
Rules for Treatment of Special Cases For some types of nodes the script must find associated nodes or data.
For example, if a user activates the context menu over an image map (' < MAP > ') the Node returned by the context menu is actually the Node of the Map. The Map may be used by an IMG, OBJECT or INPUT elements to trigger different actions, such as moving to different parts of the page or opening specific new pages. It is therefore necessary to search these other Nodes to find the appropriate element is matched to the MAP.
For example, the collection of images in the document can be obtained from the array of image Nodes held in document. images'within the DOM. It is then a simple matter to scan through these to find the images using an image map and in particular the one using the image map on which the mouse was placed. OBJECT and INPUT nodes can be searched by examining the NodeList returned by a
getElementsByTagName ("OBJECT") or getElementsByTagName ("INPUT") at the document level. In another situation style sheets/style definitions may be needed to interpret the class attributes of nodes but the presently preferred embodiment extracts the style information of each node independently so this is not necessary. If it is chosen to capture global style settings then these can be obtained by a straightforward DOM function call.
Rules for Capturing Single or Combinations of Non Meaningful Nodes It was stated that' < TD > 'and' < TR > 'tags did not represent meaningful collections of elements. In isolation they do not, without a' < TABLE > 'tag, represent well formed HTML. To the user, however, it is appealing to select rows from tables or groups of adjacent cells.
It is therefore made possible to select combinations of nodes which share a common ancestor node type. For example, table data or table rows could be lifted from the table. In this situation the script would create a new ancestor of the appropriate type possibly using the formatting attributes of the actual table from which they are being selectively extracted.
For example, one or more < TD > nodes would be surrounded by a < TR > node. One or more < TR > nodes would be surrounded by a < TABLE > node or a suitable combination of < COL > , < ROW > , < TBODY > and < TABLE > nodes. To undertake the later approach will require an analysis of the elements of the TABLE and identification of which rows and columns are affected and picking out the required formatting information. If complete rows or columns are selected then row and column heading could be picked up also.
It was stated, strictly speaking, that TEXT Nodes do not represent meaningful elements. Some of the time Text Nodes will be the childNode of a text formatting Element.
In this case the collection of Elements are captured at the formatting Element level. However it is quite common for text Nodes to appear independently of formatting elements, for example within a Link (or < A > ) Node. The embodiment must therefore transform this type of Node into an Element in order to save and subsequently display the text. This is done by embedding the text within suitable neutral formatting element such as a Paragraph ( < P > ) element.
Additionally the < BODY > element can not be saved as is within a < DIV > element. This situation is handled by extracting its childNodes and giving them a new parent Node of type < DIV > .
Facilitating, in this way, the combination, or recharacterisation, of'independent non-meaningful'elements into one, or more, meaningful collections opens up a vast array of possibilities.
Extracting HTML from the DOM At least 3 different techniques could be employed for extracting the pertinent data from the DOM.
The first approach described above, scans the Node subTree extracting tagName, attributes, style settings and nodeValues. The two main alternatives are to clone the Node, and its descendants, or use a non-DOM method implemented in IE (and it is believed in NN6 when it is released officially).
Cloning or Importing the subTree The actual DOM subTree of an element can be copied, thereby eliminating the need to recreate the HTML, only to have the browser parse it back into the DOM as a copy.
The structure and content of the Node and all its descendants can be copied by using a cloneNode or inportNode method of the Node in question. Using the deepClone option forces a copy of all the descendant Node data. This is not a pointer to the original subTree but, with the deepClone option set, a full copy of all its content. This allows the Node data to transferred to the new window.
The data must then be transferred to the database on the repository server. Since there is not a means of transferring this data to the server in its native DOM form, it is necessary to'translate'the data into its implied raw HTML in order to transfer the data as text.
If a method is developed to transmit the native DOM data to the server this approach may offer significant ease of programming and efficiency benefits over the approach described in the main body of the description.
Using the innerHTML data Internet Explorer provides access to its own version of the implied raw HTML of. a Node and its descendants in the form of the innerHTML. Because of developer pressure NN6 is likely to also include this field when it is released.
This data is not within the DOM specification and should not be used if DOM compliance is considered important.
Other DOM compliant browsers may not offer this field and hence their users would be barred from using thi-s method if this data field was used.
There are efficiency benefits in using this data as it eliminates the need to extract recursively the childNode, attribute, style and nodeValue data, but it has
significant drawbacks. As was described earlier'SRC', 'HREF'and other URI type attributes must often be modified to ensure that the full path is captured in the database. If the innerHTML data field was used it would
be necessary to search it for instances for'SRC'and 'HREF'and make the suitable amendments. Ensuring that only the instances where'SRC'and'HREF'are used as Node attributes would require involved logic and may well end up being less efficient than recursively extracting the information from the tree. If a suitable-robust and efficient-method was found, then it would be possible to consider the use of innerHTML in a commercial environment.
In the description of the repository user interface it was mentioned that a Hierarchical Tabular Representation with Views could be adopted. An example of such a representation is shown in Figure 11. Here, the user has previously saved five elements and has opened the repository choosing to use a simple tabular interface.
Three table headings are shown, although by configuring the site, the user can add as many as she wishes.
The individual images and their links can be recategorised by selecting the table headings from the dropdown menus to the left of each element. Sub-categories are also available, allowing a hierarchical representation of the bookmarked elements, similar in functionality to the browsers and other online bookmark services, albeit with a visual (as opposed to text-based) representation of the bookmarked elements.
This interface to the repository can be used with the same database structure as was described earlier, but uses fewer of the customisation settings.
As has been mentioned, the invention is not limited to HTML, but is applicable to any SGML based system including visually representable XML. Many systems developers are storing'documents'in XML format, to allow easier cross platform development, conversion from one application to another and even embedding different types of documents within each other.
In the near future, sophisticated word processing documents and spreadsheets will become part of a web-page, and vice-versa. The distinction between web-pages written in HTML and other types of documents, now stored in XML, will become increasingly blurred.
Thus, it is therefore important to recognise that the various aspects of the invention are applicable to all types of XML as long as there is an application, such as the web browsers used or an advanced word processor, that can parse and display this information, and that there is suitable access to the DOM.
The latest versions of the main web-browsers and the specification for the DOM and CSS are anticipating the inclusion of a broader set of markup tags and data into the web-browsing context. By setting out the rules for defining meaningful elements and collections of elements, as defined by their ancestor, exclusively in terms of the DTD for the XML being parsed, the various aspects of the invention can be applied to all forms of browser-parseable XML.
As long as the browser is able to parse and display the XML then it is possible to capture and store most meaningful elements.
This interface would remain the same as would most of the underlying code. However, there are some methods specified in the DOM specifically for dealing with XML that would need to be used in place of their HTML equivalents.
Implementation of this would be well within the capabilities of those skilled in the art.
The Repository User Interface would be suitable to store, display and organise visually parseable XML, if provided with suitable style sheets.
Some of the special treatment of specific HTML elements, such as the resizing of elements, would not work'out of the box'and some customisation of the application may be required for specific instances or to take advantage of some of the functionality of specific situations, such as a musical notation implementation that has sound incorporated.
Turning now to figures 12 to 15, the invention will now be described. The invention is an enhancement of the system described and is specifically aimed at overcoming the problem discussed in the introduction of saving links to other sites accompanied, for example, by an advertisement.
As discussed above, this can be problematic, as, if the user visits the advertisers web site and then returns, with the intention of saving the link, he can find that a different advertisement is being displayed and the possibility of saving the link has gone.
By way of example, figure 12 shows a typical advertisement which may appear on a website. The advertisement is for amazon. co. uk and consists of a link containing an image and the text'Apricots are tasty'. Instead of simply clicking on the advertisement to activate a link to the advertiser's website, the user now activates a context menu, for example by right clicking his mouse. The context menu is then displayed over the advertisement as shown in figure 13. It will be seen that the context menu differs from that shown in figure 2 in that includes an extra option'Copyn Click the Link'. By selecting this option, rather than simply clicking on the advertisement, the browser software will not only follow the link to the new page or document (the advertiser's website), but will also capture and hold the link data from the Document
Object Model (DOM). When the new page or document is viewed in the browser, the temporarily held information from the referring page is available in addition to any information the system described above is able to glean from the current document.
Thus, in figure 14, a temporary copy of the text'Apricots are tasty'and a reference to the image is made and the advertisers website is opened (http://www. amazon. co. uk) as if the user had merely clicked on the link. The user can now activate the system described above with respect to figures 1 to 11 which will cause the window shown in figure 14 to be displayed. The user has the additional option of a checkbox labelled'Click to attach saved image as link to page'.
The user then selects this option by clicking on the checkbox and is now shown the image and text which they selected using the'Copyn Click'option of figure 13.
This is shown in figure 15 where the text'Apricots are tasty'and the image are shown displayed within the
window. The user can then select this'saved'image and text to be saved to their repository by clicking on'Add to Copyn'thereby saving the image and text from the referring page but pointing to the resultant page.
In the system described with reference to figures 1 to 11, a list of candidate mark-up elements is drawn from the document or page which the user is viewing in their browser. In the embodiment of the invention described with respect to figures 12 to 15, the list of candidate mark-up elements includes elements that are determined by reference to the node in the Document Object Model representing the mark-up link used to navigate to the current document or page. This link could be on a different page or document on the same server or could be a different page or document on a different server or domain.
Thus, returning to the example of figure 12 to 15, when the user'Copyn Clicks'on the advertisement, he is taken to the advertiser's site. The advertisement on which he clicked is displayed to him in a window (Figure 15). The list of candidate elements now also includes candidate elements from the originating page determined from the node relating to the advertisement. This allows the user to choose the image of the advertisement and a link to the retailer's website to remind him why he was interested in the retailer's site in the first place.
The embodiment described is not confined to actual links on the source page but can construct a new link based on the URL of the further page and associate this link with the image.
In the embodiment of the invention, a rule set is used to make the selection of elements and the script extracts the raw mark-up code for the link, with any associated images.
The raw markup is placed in a cookie to store it temporarily on the user's computer. This is only one possibility. Others include storing the code in a memory associated with the browser, in a temporary memory on the computer's disk or even at the remote server. This action is initiated by the user selecting'Copyn Click'. Once the temporary storage has been made, the system navigates
to the address of the link in the usual manner.
When the user uses the system of figures 1-11 at the new document, the information held in the cookie, or otherwise, is extracted and is available as an additional option when making the selection. This gives the user the option choosing the information from the referring page as well as the options available in the figure 1-11 example.
It will be appreciated that the embodiment of the invention described makes detailed information from referring pages available. This is not possible in prior art systems in which the only information which is available about a referring page or document is its URL and the title of the referring page. Once the page or document is no longer being displayed in the browser, it is extremely difficult to obtain that detailed information. It can only be found from difficult to access low level caches. The system embodying the invention overcomes this difficulty by temporarily storing the relevant information before the referring page is removed from the memory.
Web browser security protocols are designed to stop web documents or scripts from obtaining information about documents that are being viewed that originate from a different domain. The use of an extension to the context menu in the user's browser described avoids this problem.
Scripts activated by the context menu have access to the Domain Object Model of the page on which the context menu was activated. Thus, a. script originating from the server of the provider of the system embodying the invention has access to information about documents originating from other domains. This is only available when the user actively calls these scripts by using the context menu.
Cookies are usually associated with specific web domains.
Thus cookies from one domain cannot be read or changed by pages or scripts originating from a different domain. This problem is avoided by storing the information in a cookie associated with the domain of the script rather than with that of the document being viewed. As a result, even if the new page or document is in a new domain, the script can still access the information held in the cookie and make it available.
It would be possible to modify web browsing software to make the information held in the cookie available to a script such as that described with respect to figures 1 to 11. The temporary data capture described would then capture all mark-up links followed by the user without the need to activate the context menu.
Other methods of intercepting a click on a mark-up link could be used to call the script that saves the information in the cookie. For example, a keyboard modifier could be used, such as holding down the shift key when clicking on a link. The cookie would be populated with the appropriate information only when the appropriate modifier was used. The user of an option in the context menu as described is preferred only as is can be used in all commercially available browsers. Using a keyboard modifier may require some modification to the browser software.
Other modifications are possible and will occur to those skilled in the art without departing from the scope of the invention which is defined in the following claims.

Claims (21)

Claims
1. A method of storing a portion of a first mark-up language page containing a link to a further mark-up language page, comprising the steps of: identifying, from a visual representation of the first page, a portion of the visual representation including a link to the further page ; storing the portion including the link in a temporary store; initiating the link to access the further page ; at the further page, identifying a list of candidate mark-up elements from a predefined set of elements for storage, the list including the visual representation including the stored portion from the first page; selecting elements from the list; and storing the selected elements.
2. A method according to claim 1, wherein the step of identifying a list of candidate mark-up elements comprises identifying the nodes of the Document Object Model (DOM) of the further page and the nodes of the DOM associated with the link on the first page which represent the identified elements, and extracting the mark-up-code for the identified nodes.
3. A method according to claim 2, wherein the step of extracting mark-up code relating to a visual representation including the stored portion comprises retrieving the code from the temporary store.
4. A method according to claim 1,2 or 3, wherein the step of storing the portion temporarily and initiating the link are performed as a. single user action.
5. A method according to any of claims 1 to 4, wherein the step of temporarily storing the link comprises storing the mark-up code for the link in a cookie.
6. A method according to claim 5, wherein the cookie is associated with the source domain of the script.
7. A method according to any preceding claim, wherein the steps of storing the link and initiating the link are performed by selecting a command from a displayable menu.
8. A method according to claim 7, wherein the menu is an Internet browser context menu.
9. A method according to any of claims 1 to 6, wherein the step of storing the link is performed by activating a keyboard command when initiating the link.
10. A method according to claim 5, wherein the data stored in the cookie is available to a script whereby activation of the a link automatically causes the visual representation including the stored link to be stored temporarily.
11. A computer program comprising program code means which, when run on a computer, causes the computer to perform the steps of any of claims 1 to 10.
12. A computer program product comprising program code means stored on a computer readable medium which, when run on a computer, causes the computer to perform the steps of any of claims 1 to 10.
13. Apparatus for storing a portion of a first mark-up language page containing a link to a further mark-up language page, comprising: means for identifying, from a visual representation of the first page, a portion of the visual representation including a link to the further page ; a temporary store for temporarily storing the portion including the link; a device for initiating the link to access the further page; means for identifying a list of candidate mark-up elements at the further page from a predefined set of elements for storage, the list including the visual representation including the stored portion from the first page; means for selecting elements from the list; and a store for storing the selected elements.
14. Apparatus according to claim 13, wherein the means for identifying a list of candidate mark-up elements comprises means for identifying the nodes of the Document Object Model (DOM) of the further page and the nodes of the DOM associated with the link on the first page which represent the identified elements, and means for extracting the mark-up code for the identified nodes.
15. Apparatus according to claim 14, wherein the means for extracting mark-up code relating to a visual representation including the stored portion comprises means for retrieving the code from the temporary store.
16.. Apparatus according to claim 13 14 or 15, comprising a user operated device for storing the portion temporarily and initiating the link as a single user action.
17. Apparatus according to any of claims 13 to 16, wherein the of temporary store comprises a cookie.
18. Apparatus according to claim 17, wherein the cookie is associated with the source domain of the script.
19. Apparatus according to any of claims 13 to 18, comprising a menu selectable from a user browser, the menu including an option which causes a computer to store the link and initiate the link.
20. A method according to claim 19, wherein the menu is an Internet browser context menu.
21. Apparatus according to claim 17, wherein the data stored in the cookie is available to a script whereby activation of the a link automatically causes the visual representation including the stored link to be stored in the temporary store.
GB0106920A 2001-03-20 2001-03-20 Storage of a portion of a web-page containing a link Withdrawn GB2373698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0106920A GB2373698A (en) 2001-03-20 2001-03-20 Storage of a portion of a web-page containing a link

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0106920A GB2373698A (en) 2001-03-20 2001-03-20 Storage of a portion of a web-page containing a link

Publications (2)

Publication Number Publication Date
GB0106920D0 GB0106920D0 (en) 2001-05-09
GB2373698A true GB2373698A (en) 2002-09-25

Family

ID=9911142

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0106920A Withdrawn GB2373698A (en) 2001-03-20 2001-03-20 Storage of a portion of a web-page containing a link

Country Status (1)

Country Link
GB (1) GB2373698A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1426877A2 (en) * 2002-11-27 2004-06-09 Microsoft Corporation Importing and exporting hierarchically structured data
US9218322B2 (en) 2010-07-28 2015-12-22 Hewlett-Packard Development Company, L.P. Producing web page content

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0859330A1 (en) * 1997-02-12 1998-08-19 Kokusai Denshin Denwa Co., Ltd Document retrieval apparatus
WO1998044434A1 (en) * 1997-03-27 1998-10-08 British Telecommunications Plc Data processing system and method
EP0944009A2 (en) * 1998-03-18 1999-09-22 Nortel Networks Corporation System and method for user-interactive bookmarking of information content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0859330A1 (en) * 1997-02-12 1998-08-19 Kokusai Denshin Denwa Co., Ltd Document retrieval apparatus
WO1998044434A1 (en) * 1997-03-27 1998-10-08 British Telecommunications Plc Data processing system and method
EP0944009A2 (en) * 1998-03-18 1999-09-22 Nortel Networks Corporation System and method for user-interactive bookmarking of information content

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1426877A2 (en) * 2002-11-27 2004-06-09 Microsoft Corporation Importing and exporting hierarchically structured data
EP1426877A3 (en) * 2002-11-27 2007-09-12 Microsoft Corporation Importing and exporting hierarchically structured data
AU2003262290B2 (en) * 2002-11-27 2009-09-17 Microsoft Technology Licensing, Llc Method and computer-readable medium for importing and exporting hierarchically structured data
US9218322B2 (en) 2010-07-28 2015-12-22 Hewlett-Packard Development Company, L.P. Producing web page content

Also Published As

Publication number Publication date
GB0106920D0 (en) 2001-05-09

Similar Documents

Publication Publication Date Title
US10706091B2 (en) User driven computerized selection, categorization, and layout of live content components
US11010541B2 (en) Enterprise web application constructor system and method
US7519573B2 (en) System and method for clipping, repurposing, and augmenting document content
US9519725B2 (en) Site content clipping control
US8347225B2 (en) System and method for selectively displaying web page elements
US7562287B1 (en) System, method and apparatus for selecting, displaying, managing, tracking and transferring access to content of web pages and other sources
US20100251143A1 (en) Method, system and computer program for creating and editing a website
US7454706B1 (en) Multiple-page shell user interface
GB2366498A (en) Method of bookmarking a section of a web-page and storing said bookmarks
US20060155728A1 (en) Browser application and search engine integration
Spainhour et al. Webmaster in a Nutshell
US20050278698A1 (en) Multi-window based graphical user interface (GUI) for web applications
US20080189604A1 (en) Derivative blog-editing environment
WO2002017162A2 (en) Capture, storage and retrieval of markup elements
GB2373698A (en) Storage of a portion of a web-page containing a link
US7698655B2 (en) Portal branding
GB2366497A (en) Database for storage and retrieval of bookmarks of portions of web-pages
GB2366499A (en) A method of storing a portion of a web-page
Hadlock jQuery Mobile: Develop and design
AU2002100469A4 (en) A thin-client web authoring system, web authoring method
Narayana et al. Management of Internet resources on library homepage: A special reference to NAL library homepage
Woolston User Controls and Ajax. NET
Heilmann et al. Layout and Navigation
Lydford TechieTogs: Site Administration and Finishing Touches
WO2007143078A2 (en) Recursive browser system and uses therefor

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)