WO2001090873A1 - System and method for generating a wireless web page - Google Patents

System and method for generating a wireless web page Download PDF

Info

Publication number
WO2001090873A1
WO2001090873A1 PCT/US2001/016576 US0116576W WO0190873A1 WO 2001090873 A1 WO2001090873 A1 WO 2001090873A1 US 0116576 W US0116576 W US 0116576W WO 0190873 A1 WO0190873 A1 WO 0190873A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
page
hierarchical
web page
stracture
Prior art date
Application number
PCT/US2001/016576
Other languages
French (fr)
Inventor
Brett Matthew Keating
Michael Scott Hohman
Ivan Aladjoff
Jose Fa Keating
Jacob Sullivan
Original Assignee
2Roam, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 2Roam, Inc. filed Critical 2Roam, Inc.
Priority to AU2001264810A priority Critical patent/AU2001264810A1/en
Publication of WO2001090873A1 publication Critical patent/WO2001090873A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Definitions

  • This invention relates generally to a system and method for permitting a user to analyze a document in order to break the document into a hierarchical structure and generate a new document and in particular to a system and method for permitting a user to analyze an information source, such as a hypertext markup language (HTML) web page, an XML document, an ICE document (a content syndication format) or Reuters, in order to generate one or more wireless web pages corresponding to the original information source.
  • HTML hypertext markup language
  • XML document XML document
  • ICE document a content syndication format
  • Reuters content syndication format
  • atomics are a small part or portion of the web page.
  • the atomics may be clustered into groups which may in turn be clustered into bigger groups. For example, each top story on a web page may be an atomic while all of the stories together may be treated as a group.
  • the identification of the atomics in the web page and the hierarchical structure of the atomics is an important task. For example, a system may decompose the web page into the hierarchical atomics and then use the hierarchical atomics to restructure the web page for one or more different users wherein each user may request a slightly different parts of the web page or each user is using a device that can only handle certain pieces of the web page due to memory or screen size limitations.
  • the deconstructed web page may also be used for a variety of other purposes.
  • a system that permits a single user, such as the producer of a web page, to more easily deconstruct an information source, such as an HTML web page, an XML document, an ICE document (a content syndication format), a Reuters feed or the like, into its atomics to generate atomics, relate the atomics to each other and assign properties to the atomics so that newly formatted wireless pages for one or more different wireless devices may be automatically generated and it is to this end that the present invention is directed.
  • an information source such as an HTML web page, an XML document, an ICE document (a content syndication format), a Reuters feed or the like
  • a web page is often dynamic in that the content in the web page may constantly change or be updated. For example, web pages containing information about the news, trading information, shopping information and the like must be continuously updated to reflect changes.
  • a web page is dynamic, the process of attempting to generate a new page having a particular format from the original web page is even more difficult. For example, if a designer develops a new page when the original news story web page had two top stories, that new page becomes obsolete as soon as the stories change or the number of top stories change because the page will not longer be accurate or up to date.
  • the task of trying to manually generate new pages based on a dynamic web page is extraordinarily difficult and very time consuming since the new pages quickly are out of date.
  • a system and method for generating a wireless web page in accordance with the invention is provided wherein an information source, such as an HTML web page, an XML document, an ICE document (a content syndication format) or Reuters, may be automatically broken down into its constituent parts within a hierarchical format so that a new page having a different format or contents may be automatically generated based on the original information source.
  • the invention is particularly useful in the context of re-purposing an HTML web page for one or more different wireless devices wherein each wireless device has different memory and screen size limitations that make it necessary to generate a differently formatted series of pages (known as cards) for each wireless device.
  • the invention is also particularly useful for generating one or more different wireless pages from a dynamic web page for one or more different wireless devices having different screen sizes as described in more detail below.
  • the wireless web page generating system in accordance with the invention may be utilized by a producer of a web site who wishes to re-purpose the content from the web page to one or more different wireless devices wherein each different wireless device may have a different screen size so that each wireless web page must have a slightly different format.
  • the producer may, without help, re-purpose the web page for the multiple different wireless devices so that the wireless pages may be automatically generated by the wireless page delivery system.
  • the system and method for generating a wireless web page in accordance with the invention known as the NomadTM Wireless Toolkit, may include a graphic user interface (GUI) portion.
  • GUI graphic user interface
  • the system allows producers of web page using the Wireless Toolkit to specify how their website content should appear to wireless devices and to then directly communicate the specifications of the desired web page to an intelligent harvesting and navigation system and method so that the wireless pages may be generated.
  • the system also permits the producer to preview their web site content on one or more wireless pages that emulate how it will appear on a wireless devices.
  • the system permits the producer (also referred to herein as the user) to rapidly process a web page to generate a hierarchical list of atomics contained in the page and to produce a resultant page which has some or all of the atomics from the original web page.
  • the web page producer must typically re-format the web page for display on the device with the more limited memory or screen size.
  • the producer may also need to re-format the web page for many different wireless devices wherein each wireless device has a different size screen so that the page generated for each wireless device is unique.
  • the producer may define each of the wireless pages for each of the wireless devices so that wireless pages may be automatically generated and wireless pages for dynamic web sites may also be automatically generated.
  • an apparatus for processing an information source wherein the apparatus retrieves an information source and extracts one or more elements from the information source wherein each element comprising a piece of content within the information source.
  • the apparatus also generates a data structure that represents the hierarchical structure of the elements in the information source and processes the data structure in order to retrieve predetermined elements from the information source.
  • the apparatus may include a page viewing portion for viewing the page from which elements are being extracted, a page navigator portion for viewing a hierarchical list of elements extracted from the page, a user dragging an element from the page viewing portion to the page navigator portion to extract the element from the page, and an element property portion for viewing the properties of an element in the list of the page navigator portion, the page viewing, page navigator and element property portions permitting the user to rapidly extract elements from the page by simultaneously viewing the page and the hierarchical list of elements.
  • the apparatus converts the information source into a first hierarchical structure containing the content and the hierarchical structure and then determines a generalized path to the element in the information source so that the element is located even if the information source changes.
  • the first hierarchical structure comprises one or more nodes each containing an element wherein a particular element is located in a first node of the hierarchical structure and the generalized path determiner comparing a first node containing the data to each other node in the hierarchical structure to identify a unique node identifier.
  • the generalized path determiner also identifies a turning-point node associated with the first node if a unique identifier is not located during the comparison, the turning point node being a node of the hierarchical structure that uniquely identifies the first node, and specifies a descendants axis as a turning-point node if there are no descendants of the node that match the first node.
  • a graphical user interface for extracting one or more atomics from an HTML web page includes a page viewing portion for viewing the page from which atomics and groups of atomics are being extracted, a page navigator portion for viewing a hierarchical list of atomics extracted from the page wherein a user dragging an atomic from the page viewing portion to the page navigator portion to extract the atomic from the page, and an atomic property portion for viewing the properties of an atomic in the list of the page navigator portion.
  • the page viewing, page navigator and element property portions permit the user to rapidly extract atomics from the page by simultaneously viewing the page and the hierarchical list of atomics.
  • a graphical user interface for extracting one or more elements from a HTML web page that views a page from which atomics are being extracted, navigates the page by viewing a hierarchical list of atomics extracted from the page wherein the user drags an atomic from the page viewer to the page navigator to extract the atomic from the page, and an atomic property generator that extracts the properties from the atomic selected by the user so that the user views the page wherein the hierarchical list of atomics and the properties for a selected atomic simultaneously.
  • a method for processing a web page to re-purpose the web page for one or more wireless devices having different screen formats by determining paths to pieces of content in the web page is provided.
  • the method generates a first hierarchical structure based on the web page wherein the first hierarchical structure comprising the structure of the web page and the content in the web page.
  • the method then generates a second hierarchical structure of the web page from the first hierarchical structure wherein the second hierarchical structure comprising the structure of the web page wherein paths to the content are indicated.
  • the method then generates relative paths to the content in the web page wherein the relative paths are inserted into the second hierarchical structure, and robustifies the paths in the second hierarchical structure so that a search for content using a path to the content locates the content even if the web page has changed.
  • Figure 1 is a block diagram illustrating a wireless page delivery system
  • FIG. 2 is a block diagram illustrating the wireless web page generation system in accordance with the invention.
  • Figure 3 is a diagram illustrating an example of a portion of a web page broken down into one or more atomics in accordance with the invention
  • Figures 4a - 4c are diagrams illustrating an example of the user interface for the GUI tool and in particular the integrated desktop, HTML viewer and the source viewer, respectively, in accordance with the invention that is within the system shown in Figure 2;
  • Figure 5 is a diagram illustrating an example of a wireless navigation viewer in accordance with the invention that is within the system shown in Figure 2;
  • Figure 6 is a diagram illustrating an example of a project manager in accordance with the invention that is within the system shown in Figure 2;
  • Figure 7 is a diagram illustrating an example of a ruleset addition viewer in accordance with the invention that is within the system shown in Figure 2;
  • Figure 8 is a diagram illustrating an example of a URL definition manager in accordance with the invention that is within the system shown in Figure 2;
  • Figures 9A and 9B are diagrams illustrating examples of a wireless features manager in accordance with the invention that is within the system shown in Figure 2;
  • Figure 10 is a diagram illustrating an example of a deployment manager in accordance with the invention that is within the system shown in Figure 2;
  • Figure 11 illustrates an example of a dynamic web page wherein the content in web page has changed between the two samples
  • Figure 12 illustrates an example of the HTML tree of the two samples of web page of Figure 11;
  • FIG 13 illustrates an example of the relational mark-up language (RML) code for each sample of the web page of Figure 11;
  • RML relational mark-up language
  • Figure 14 illustrates the unprocessed agnostic RML code (ARML) for each sample of the web page of Figure 11;
  • Figure 15 illustrates the preprocessed and robustified agnostic RML code (ARML) for each sample of the web page of Figure 11;
  • Figure 16 illustrates the generalized agnostic RML code (ARML) that is capable of retrieving the appropriate content from either sample of the web page shown in Figure 11;
  • ARML generalized agnostic RML code
  • Figure 17 illustrates an example of the XSL stylesheet that correctly retrieves content from either of the web page samples in accordance with the invention
  • Figures 18A and 18B are block diagrams illustrating two embodiments of the XSL generator in accordance with the invention.
  • Figure 19 is a diagram illustrating an example of group nodes in accordance with the invention.
  • Figure 20 is a flowchart illustrating the generalizer method in accordance with the invention.
  • Figure 21 is a diagram illustrating an example of the generalization method in accordance with the invention.
  • Figure 22 is a block diagram illustrating more details of the XPATH robustifier in accordance with the invention.
  • Figure 23 is a diagram illustrating an example of the paths to dynamic content
  • Figures 24A and 24B are a flowchart illustrating a node comparison method in accordance with the invention.
  • FIGS. 25A and 25B are a flowchart illustrating a turning-point node identification method in accordance with the invention.
  • Figure 26 is a diagrafn illustrating an example of the turning-point method in accordance with the invention.
  • Figure 27 is a flowchart illustrating a descendant identification method in accordance with the invention.
  • Figure 28 is a diagram illustrating a producer adding an atomic to a new page in accordance with the invention.
  • Figure 29a - 29c are diagrams illustrating the producer defining a ruleset in accordance with the invention.
  • Figure 30 is a diagram illustrating the producer deploying the ruleset in accordance with the invention.
  • Figure 31 is a diagram illustrating an example of an XSL stylesheet in accordance with the invention.
  • Figures 32a and 32b are diagrams illustrating an example of a new page on a cellular phone and on a Palm device, respectively.
  • the invention is particularly applicable to deconstructing a web page to generate one or more new wireless pages for one or more different wireless devices and it is in this context that the invention will be described. It will be appreciated, however, that the system and method in accordance with the invention has greater utility, such as to other types of information sources including but not limited to an XML document, an ICE document (a content syndication format) or a Reuters feed or any other type of information feed, where the information source may be deconstructed into one or more elements to generate new pages.
  • a wireless page delivery system that may be used in conjunction with the wireless web page generator in accordance with the invention will be briefly described.
  • FIG. 1 is a block diagram illustrating a wireless page delivery system 10 that may be used in conjunction with the wireless page generation system in accordance with the invention.
  • the system 10 may include one or more content providers or information sources 11, such as companies that would like to be able to deliver their web pages from a web site to one or more different wireless devices wherein each wireless device may require the web page to be formatted in a particular manner due to the size of the screen of the wireless device, the memory of the wireless device or the communications link between the wireless device and the web site.
  • the system may also include a gateway 12, a web server 13, a wireless communications system 14 to the wireless device and a wireless web page delivery portion 15.
  • the gateway may intercept an incoming HTTP request from a wireless device and route the request to the web server 13 and on to the wireless page delivery portion 15.
  • the wireless page delivery portion 15 may retrieve the actual requested HTML page, reformat the page into one or more cards and decks for the particular wireless device and send the reformatted cards and decks to the wireless device using the web server 13 and the gateway 12.
  • the wireless page delivery portion 15 may further include an appliance connection handler 16, a content connection handler 17, an XML engine 18, a layout engine 19, a rules database 20 and an XSL ruleset database 21.
  • the system may receive the incoming HTML page request, retrieve the web page, reformat the HTML page into XHTML, generate an RML document from the XHTML document, format the elements from the RML document into one or more cards and decks to form a presentation shoe that is delivered to the wireless device.
  • the interactions of the portions of the wireless page delivery system are shown in Figure 1 in more detail and further described in the above incorporated co-pending patent application. Therefore, the operation of the wireless page delivery system will not be described in any more detail.
  • FIG. 2 is a block diagram illustrating a wireless web page generation system 22 in accordance with the invention.
  • the web page generation system permits a producer or company with a web site to control the look of its one or more web page when the web pages are downloaded to a wireless device as will be described in more detail below.
  • the wireless web page generation system 22 may include a back-end portion 23 and a front-end portion 24.
  • the front-end portion may also be referred to as a graphical user interface (GUI) tool.
  • GUI graphical user interface
  • the back-end portion may include one or more compiled JAVA programs/modules that implement the functions of the back-end as described in more detail below and the front-end may be one or more Visual Basic modules/programs that implement the functions of the front-end (GUI Tool) as described in more detail below.
  • GUI Tool Visual Basic modules/programs that implement the functions of the front-end (GUI Tool) as described in more detail below.
  • the GUI tool and the back-end may be connected to each other using APIs as is well known.
  • the back-end 23 may further include the web page delivery portion 15 shown in Figure 1, an RML builder module 25, an XSL generator module 26 and a stylesheets database 27.
  • the function of each module will be described herein and a more detailed description of each module will be provided below.
  • the web page delivery portion 15 may generate XHMTL.
  • the RML builder module 25 may generate an RML document based on a generated ruleset as described in more detail in the incorporated co-pending patent application and output the RML document into the XSL generator 26 that generates an XSL stylesheet based on the RML document.
  • the generated stylesheet may be stored in the database 27.
  • the XSL stylesheet may be used to automatically generate one or more cards from a web page so that the web page may be downloaded and displayed on a wireless device.
  • the GUI tool 24 may further include a ruleset construction toolset 28, a ruleset database 29, a project construction toolset 30 and a wireless website projects database 31.
  • the Graphical User Interface (GUI) tool enables the user to interact with the application.
  • the GUI tool can perform content selections, configuration and deployment for their wireless website project including defining the one or more cards that contain the content of the web site.
  • the GUT has the look and feel of standard MS Windows-type application, and confo ⁇ ns to MS Windows applications standards.
  • the ruleset construction toolset 28 may permit the user to create and define ralesets.
  • a ruleset expresses how the wireless page delivery system 15 should transform the content and services from a desktop-centric webpage into one or more cards destined for a wireless device such as the new formatting for the cards and which content goes on which card.
  • a ruleset may also define which URLs use a particular ruleset.
  • the ruleset may also include an XSL stylesheet that specifies how the web page is transformed into one or more wireless pages.
  • the ruleset construction toolset 28 may receive the XHTML document representing a web page from the web delivery portion 15 and generate one or more rulesets based on the XHTML that may be stored in the database 29.
  • the one or more rulesets determine how the HTML web page will look on the wireless devices when the web page is converted into the wireless web page.
  • the rulesets in the database 29 may be sent to the RML builder 25 that generates the RML document and it may also be sent to the project construction toolset 30 that generates the wireless website projects for the incoming web pages as described below.
  • the finished projects are stored in the database 31.
  • a producer may interact with the GUI tool to generate a wireless website project which includes information about the look of the HTML web page on the one or more wireless devices.
  • the wireless delivery portion 15 may retrieve that web page and generate an XHMTL document corresponding to the web page.
  • the user may extract or automatically extract one or more elements from the web page as described below with reference to Figures 3 and 4. From the extracted elements, known as atomics hereinafter, the user may generate the look of the wireless pages and review the wireless pages.
  • one or more rulesets are generated that capture the information about the look of the wireless pages so that the wireless page delivery system 15 (See Figure 1), when it receives a request for a web page, automatically generate the appropriate one or more cards for the wireless device based on the generated rulesets and stylesheets.
  • the wireless page delivery system automatically generates the wireless pages in accordance with the stylesheets.
  • the RML builder module 25 and the XSL generator module 26 may generate an RML document and then generate an XSL stylesheet that reflects the producer's requirements as embodied in the rulesets and the RML document.
  • the ruleset may also be used to generate project information that may be combined with the XSL stylesheet to generate a wireless website project that may then be deployed using the wireless web page delivery system as shown in Figure 1.
  • the user may specify the format of its web pages on the wireless devices.
  • the ruleset construction toolset 28 may further include a page viewer module as described with reference to Figures 4a - c and a wireless navigation viewer module as described below with reference to Figure 5.
  • the project construction toolset 30 may further include a project manager module as described below with reference to Figures 6 and 7, a URL definition manager module as described below with reference to Figure 8, a wireless feature manager module as described below with reference to Figures 9A and 9B, a deployment manager module as described below with reference to Figure 10 and a emulator module that is described below with reference to Figures 32a and 32b. Each of the these modules will now be described in more detail.
  • Figure 3 is a diagram illustrating an example of a portion of a web page 40 broken down into one or more atomics and groups of atomics in accordance with the invention.
  • an example of a portion of an HTML web page 40 from an on-line stock brokerage company designate atomics while the boxes enclosing them constitute groups of atomics.
  • An atomic is the basic building block of the web page, such as a word, a quote for a stock, the name of the stock, headlines, paragraphs, links, images, and other basic elements of a page that form the most fundamental elements of the web page.
  • the system may automatically identify the atomics on a web page based on the XHTML document.
  • a group is defined as a user-defined set ' of atomics that can be nested into a hierarchical construction. These logical, hierarchical sets define the navigation experience for the user, and are the key to building wireless websites.
  • a quote look-up form 41 For example, at the top of the page 40 is a quote look-up form 41.
  • the quote look-up form 41 is made up of a groups of three atomics, a "Quotes" title portion 41a, an entry box 41b and a "Go" submission button 41c.
  • a market graph 42a, table 42b, and Fool.com advertisement 42c are each related atomics and are grouped together to constitute group 42.
  • each element in the market graph 42a may also be an atomic so that "NASDAQ” is an atomic, "2756.27” is an atomic, the down arrow is an atomic and "-5.48" is an atomic.
  • TheStreet.com logo 43a, and the news stories 43b-c are each related atomics and are grouped together as group 43. All of the groups 41, 42, 43 make up the root group 40. These groups constitute the relational hierarchy for this portion of the E-TRADE website. Thus, any web page may be broken down into atomics and groups of atomics so that the atomics and groups of atomics may be reconstituted into a new web page having a different subset of atomics or a different format or the like. Now, the page viewer module of the ruleset construction toolset will be described in more detail.
  • FIGS. 4a - 4c are diagrams illustrating an example of an integrated desktop user interface 50 for the GUI tool in accordance with the invention that is within the system shown in Figure 2.
  • the user interface is designed to permit the user to view all of the tools in the ruleset construction toolset simultaneously as shown.
  • the user interface 50 may include a page viewer portion 52, a page navigator portion 54, an atomic property portion 56, and one or more tabs 58 that changes the item being viewed in the page viewer portion 52.
  • the tabs may include an HTML tab 60 that permits the user to view the HTML code, a construction tab 62 that permits the user to view the graphical page as shown in Figure 3, and a source tab 64 that permits the user to view the source of the page.
  • the page viewer 52 is the primary work environment for ruleset construction. It displays the desktop-centric webpages that users wish to configure for wireless delivery. Users select elements (atomics or groups of atomics) from the targeted webpage with a mouse, and then set properties for each element. The page viewer permits the user to view the page from which atomics are being extracted or the page that is being created using the extracted atomics in the different item view modes controlled by the tabs.
  • Figure A an auction page for eBay is shown. Now, the atomic extraction process will be briefly described.
  • a modified HTML editor such as Microsoft® DHTML
  • the typical HTML editor may display the HTML as it is typically rendered, but doesn't allow click-throughs to links and also does not allow selection of particular HTML subtrees. For instance, if a piece of text is selected, the typical HTML editor tells the user which node in the HTML tree contains all the selected text.
  • the modified HTML editor in accordance with the invention has been modified to pass back the HTML node with the selected content based on the user click.
  • the atomic extractor then digs into the underlying tree node structure to determine the path to the selected element by iterating through it's valid parents (since the control throws in a number of design time tags which are not existent outside the editor and there are some validations we threw in the mix here as well).
  • the path to the selected content represented by the atomic is the absolute path to that atomic and is used later to describe the location of the particular content.
  • the page navigator 54 is very similar to the construction view of the page viewer and can initiate all of the construction viewer's functionality. However, the page navigator 54 displays the selected page elements in a tree structure rather than a graphical format.
  • the tree structure is a hierarchical representation of how the selected elements are structured into groups and atomics.
  • the page navigator portion 54 illustrates the hierarchical structure of the page.
  • the page is represented as groups and atomics arranged in the hierarchical relationship as shown.
  • the hierarchical relationship of the atomics and groups may be manually generated by the user by dragging an element from the page shown in page viewer portion 52 and dropping it into the navigator portion 54.
  • the system may automatically extract the atomics and groups from the web page as described above. The user may then arrange the atomics and groups in the navigator portion 54 to accurately reflect the page.
  • the properties of that atomic may be displayed in the atomic property portion 56.
  • the atomic property portion 56 allows the user to view and configure properties for each element in the wireless website.
  • the properties may express certain attributes for each element (for example, a property might define which classes of wireless devices may display the element).
  • the properties of the atomic may include its name, its class, its RML path, its HTML path, its tag, a sample of the atomic and any other information.
  • the GUI tool interface permits the user to view all of the above portions simultaneously as shown in Figure 4 so that the user does not have to switch between screens or applications to see all of the necessary information to deconstruct the page and develop a new page.
  • Figure 4b illustrate the HTML viewer 60 wherein the graphical page generated based on the HTML code is shown.
  • Figure 4c illustrate the source viewer 64 wherein the actual HTML code that generates the graphical page is shown.
  • FIG 5 is a diagram illustrating an example of a wireless navigation viewer module 70 in accordance with the invention within the system shown in Figure 2.
  • the wireless navigation viewer 70 allows the user to graphically view the wireless navigational structure of their one or more cards generated based on the ruleset that was created using the ruleset construction toolset as described above.
  • a single webpage is typically delivered to a wireless device in a series of presentation decks. These decks contain one or more cards wherein each card contains webpage content that has been formatted appropriately for the screen of each wireless device.
  • the wireless navigation viewer module presents these decks in graphical form to the user so that the user can ensure that the web page has been divided into the cards in an appropriate manner.
  • the user may review the results of the ruleset generated using the ruleset construction toolset.
  • the web page has been broken down by the user's ruleset into a first card 72 containing an auction item that is linked to three other cards 74, 76, 78 containing bid statistics, seller information and a description of the item, respectively.
  • the bid statistics card 74 may have a bid card 79 and a how to bid card 80 linked to it.
  • the description card 78 may have an image card 82 linked to it that shows an image of the item.
  • the project construction toolset 30 allows users to combine one or more rulesets into a wireless website project (WWP). Using the project construction toolset, users can:
  • the project construction toolset further comprises a project manager module, a URL definition manager module, a wireless feature manager module, a deployment manager module and an emulator module. Each of these modules will now be described.
  • Figure 6 is a diagram illustrating an example of a user interface 90 for the project manager in accordance with the invention.
  • the project manager allows the user to maintain the set of rulesets that make up a WWP.
  • users can modify the WWP by adding or removing previously created rulesets as described below with reference to Figure 7.
  • a list 92 of one or more different rulesets are shown. For each ruleset, the user that created the ruleset, its last update date, its status (deployed or modified) and the URL rule are listed.
  • the user interface 90 may include one or more buttons 94 that permit the user to, for example, add a ruleset, deploy a ruleset, define a URL, finish the project management or cancel the prior command.
  • buttons 94 that permit the user to, for example, add a ruleset, deploy a ruleset, define a URL, finish the project management or cancel the prior command.
  • FIG. 7 is a diagram illustrating an example of a ruleset addition viewer user interface 100 in accordance with the invention.
  • the ruleset addition user interface permits the user to add a ruleset, previously created using the ruleset construction toolset, to a particular project.
  • the user may select from one or more existing rulesets (StartPage.rs, Categories. rs, IteniList.rs and ItemDescription.rs in this example) and add them into the project.
  • the user has selected the ItemDescription.rs ruleset which defines the cards and decks and formats used to present the description of an item to the user.
  • the URL definition manager in accordance with the invention will be described.
  • FIG 8 is a diagram illustrating an example of a URL definition manager user interface 110 in accordance with the invention.
  • the URL definition manager allows the user to maintain the URL definition tables for the project.
  • the URL definition tables allow the wireless page delivery system 15 shown in Figure 2 to select the appropriate ruleset for each URL request.
  • this module enables the user to define how URLs are mapped to rulesets. For example, as shown in Figure 8, the ItemDescription.rs has been selected and the URL definition manager lists one or more tokens 112 that are used to identify whether the particular ruleset applies to the particular URL.
  • a circle 114 may be filled in for each token indicating whether the particular token is required to be matching to use the ruleset ("Must Have"), whether the token in the URL is not relevant (“Don't Care") or whether the token cannot exist in a URL for the ruleset ("Must Not Have".
  • a URL must have the following tokens: eBay; com; aw-cgi; and eBayISAPI.dll?NiewItem&Item in order to invoke the ruleset.
  • a URL that is "http://www.elBay.com" would not invoke the ruleset since the URL does not contain eBay.
  • Figures 9 A and 9B are diagrams illustrating examples of user interfaces 120, 122 for the wireless features manager in accordance with the invention.
  • the wireless features manager is used to include certain specialized wireless features that are not available from desktop-centric website content.
  • the wireless features manager allows the user to integrate these features into their wireless website.
  • the user can include specialized messaging features into the WWP or revenue features as will be briefly described as an example of the wireless features that may be added.
  • Figures 9A and 9B illustrate an example for including an E-Revenue feature to the wireless page.
  • the wireless feature manager permits the user to add E-Commerce features (including secure sockets and integrated payments) or to add promotions features (including wireless advertising, couponing and sponsorships).
  • the user interface 122 for adding the wireless advertising is shown in more detail.
  • the user may specify, for example, the advertising partner and that partner's URL along with information about how often the advertising is going to be shown to the wireless device user.
  • one or more wireless devices may be chosen for the advertising and each different wireless device may have a different level of advertising. For example, some wireless devices, such as Internet Phones, Handhelds, are selected and the advertisement may be repeated on every deck presented to the user as opposed to a WAP phone where the advertisement is shown once per session.
  • the user may customize the features associated with the wireless page using the wireless features manager. Now, the deployment manager will be described in more detail.
  • FIG 10 is a diagram illustrating an example of a deployment manager user interface 130 in accordance with the invention.
  • the deployment manager allows the user to control the deployment of the WWP.
  • the users can use the deployment manager to deploy the WWP either to a testing environment or to a production environment.
  • the deployment manager also includes deployment version control, which allows users to return to previous versions as necessary.
  • Figure 10 shows an example of the deployment manager user interface showing the versions for a project called "whatever. nmd". Now, the back-end of the wireless page generating system in accordance with the invention will be described in more detail.
  • the back-end includes the RML builder 25, the XSL generator 26 and the stylesheets database 27 as shown in Figure 2.
  • the RML builder 25 stores and updates agnostic RML documents based on the rulesets generated by the GUI. These documents contain user-specified project data, which has been gathered via the GUT. The data includes not only user-designated groups and atomics, but also user-defined attributes and additional rules for handling content, agnostic RML documents group data hierarchically which mirrors the structure of the RML documents which are used as input to the Layout Engine.
  • a web page dynamic in this example
  • robustified agnostic RML code and finally into a XSL stylesheet that will extract the desired content from the dynamic web page will be described.
  • Figure 11 illustrates an example of a dynamic web page wherein the content in two samples 131, 132 of the web page changes.
  • the web page may be dynamic since it changes from the first sample 131 to the second sample 132 which makes it more difficult to accurately extract content as described below.
  • the first sample 131 may include a group 133 and one or more atomics 134 in the group.
  • the atomics may be "Yin” and "Yang”.
  • the second sample 132 may also have a group 133 and atomics 134. However, in the second sample, the number of atomics in the group 133 has increased by one so that the atomics are "Yin", "Yang” and "Dragon".
  • the first sample may be the web page at a first predetermined time and the second sample may be the same web page at a second predetermined time when the content has changed.
  • the people generating the wireless web pages must redo all of the wireless web pages when the web page changes as shown in Figure 11.
  • the appropriate content from the web page may be extracted.
  • the web page is structurally represented by its HTML tree as shown in Figure 12.
  • Figure 12 illustrates an example of the HTML trees corresponding to the web page samples of Figure 11.
  • each sample of the web page is shown and the differences between the two samples are evident.
  • the upper parts of the HTML tree are identical since the content and structure of the web page is identical.
  • a table structure is a parent of a TR group node 133 which is the parent of two TD nodes 134 that contain the atomics "Yin” and "Yang”.
  • the HTML tree for the second sample has the same table parent node and the same TR group node 133, but there are three TD nodes that contain the three atomics, "Yin", “Yang” and 'Dragon".
  • FIG 13 illustrates an example of the relational mark-up language (RML) code 135 generated based on the web page samples of Figure 11.
  • the RML code may be known as orthodox RML since the code contains the content in contrast to agnostic RML (ARML) code that contains only queries into the structure of the web page, but has the same relational hierarchy as the orthodox RML.
  • the RML code in the current paradigm is typically constructed by applying an XSL stylesheet to the HTML page to convert the HTML to the RML format through the selection of particular pieces of desired content.
  • the RML code for each sample is different since the second sample contains an extra atomic as described above.
  • a group tag is generated for each group 133 and an atomic tag is generated for each atomic 134.
  • the RML captures the desired content of the web page and contains it in a hierarchical structure of atomics and groups of atomics.
  • Figure 14 illustrates an example of unprocessed agnostic RML code (ARML) 136 for each sample.
  • ARML agnostic RML code
  • Figure 14 illustrates an example of unprocessed agnostic RML code (ARML) 136 for each sample.
  • these represent the ARML before any processing, where each atomic contains a fully-specific path.
  • the ARML code is preprocessed, then robustified and generalized as described below with reference to Figures 15 and 16.
  • a sample of the robustified ARML code 137 for each sample of the web page is shown in Figure 15.
  • the paths to the desired content are robustified so that even if the web page is dynamic and changes (such as the change between the first sample 131 and the second sample 132 shown in Figure 11), the XSL stylesheet shown in Figure 17 will extract the correct content from the web page.
  • Figure 16 illustrates the generalized ARML code in accordance with the invention.
  • Figure 17 illustrates an example of the XSL stylesheet 139 generated based on the ARML code of Figure 16 that correctly processes both of the web page samples in accordance with the invention.
  • two embodiments of the XSL generator 26 will be described in more detail.
  • FIGS 18A and 18B are block diagrams illustrating more details of two embodiments of the XSL generator 26 in accordance with the invention.
  • the XSL generator receives the agnostic RML from the RML builder and uses it to construct an XSL stylesheet.
  • the XSL stylesheet is used by the wireless page delivery system to automatically generate one or more cards and decks based on the HTML web page.
  • the XSL engine in the wireless page deliver system 15 (known as CatalystTM) will later refer to this stylesheet when it converts XHTML to orthodox RML for reformatting a web page into wireless pages for wireless devices.
  • the XSL generator may automatically create the stylesheet based on the user input as stated above.
  • the XSL generator may further include an XPATH robustifier as described below that automatically parses through the XPATHs in the RML document and attempts to generalize the paths so that the XSL stylesheets may still extract the proper atomics or groups even when the original web page changes. For example, if a web site typically has its top stories in a table with cells for each story, then, when the web site adds an extra top story into the table, the XPATH robustifier has modified the original stylesheet so that the stylesheet still retrieves the correct content as described in more detail below even with the extra top story. Thus, even dynamic web pages with changing content may be automatically processed in accordance with the invention to properly generate one or more wireless pages or cards.
  • an XPATH robustifier as described below that automatically parses through the XPATHs in the RML document and attempts to generalize the paths so that the XSL stylesheets may still extract the proper atomics or groups even when the original web page changes. For example, if a web site typically has its top stories in a
  • the XSL generator 26 may include one or more modules that may be software applications being executed by a CPU on a server.
  • the XSL generator 26 may include an XPATH preprocessor module 140, an XPATH robustifier module 142 and an XSL writer module 144.
  • the XPATH robustifier 142 is removed. Without the XPATH robustifier, the system generates XSL stylesheets that will extract the appropriate information from a web site, but may not be able to extract the proper information if the web site is dynamic.
  • a dynamic web page may also be properly processed using the stylesheets because the XPATH robustifier attempts to generalize the XPATHs in the stylesheet so that changes in the web page do not disrupt the stylesheet as described below in more detail.
  • Each module in the XSL generator 26 will now be described in more detail.
  • the role of the XPATH preprocessor 140 is to define the relative paths to selected content in the RML.
  • An XHTML tree uses an absolute path to the selected content which records every node between the root node and the selected content.
  • using relative paths generalizes the stylesheet considerably.
  • the relative paths may be used by the XPATH robustifier for the generalization method that is described below with reference to Figures 20 and 21.
  • the XPATH preprocessor first determines the group nodes for selected content. The group nodes are the lowest parent nodes linked to all of the grouped selected content.
  • FIG 19 is a diagram illustrating three examples of identifying the group nodes in accordance with the invention.
  • an XHTML tree 150 is shown with the absolute path to the selected content 152 (shown as shaded in the diagram) along with an RML tree 154 including a group node 156 and the selected content 152.
  • the selected content 152 have the absolute paths "ABDG” and "ABDH” in the XHTML tree and a group node "ABD" is located which is the parent of the both pieces of selected content so that the relative path becomes from the group node "ABD" to each selected content (Atomic G and Atomic H).
  • selected content E and I have absolute paths "ABDI” and "ABE” so that the group node that contains the parents of the selected content is "AB” and the RML tree 154 contains the group node and the two selected content (Atomic DI and Atomic E) as shown.
  • the selected content E and J has absolute paths "ABE” and "ACFJ” so that the group node that contains the parents of the selected content is "A” and the RML tree 154 contains the group node and the two selected content nodes (Atomic BE and Atomic CFJ) as shown.
  • the XPATH preprocessor attempts to simplify the path to the selected content by determining the lowest group node in the tree that is the parent for the selected content.
  • the XSL writer module 144 creates an XSL stylesheet from the robustified agnostic RML. It does this by constructing a template for each node in the tree (if generalized, some nodes may point to several different pieces of content in the tree). An element handler writes the code for each template. If nodes contain content, the element handler must also implement user-defined methods for handling the content in a specific way. For example, image display may be handled in several ways:
  • the element handler creates code that implements that decision in XSL. These selections are considered auxiliary (that is, non-attribute) preferences. Now, the generalizing method that is carried out in the XPATH robustifier will be described.
  • Figure 20 is a flowchart illustrating the generalizer method 160 in accordance with the invention.
  • the generalizer method is one example of a technique in accordance with the invention for generalizing the paths to the selected content. Another technique is embodied in the robustifier as described below. Both of the techniques may or may not be used with the wireless page generation system. Both techniques may be used to help the wireless page generation system handle dynamic web pages in which the content may change.
  • Generalization is the process of applying the content selection and formatting of one element to other, similar elements. Generalization takes into account that elements targeted for generalization may occur an arbitrary number of times within an XHTML page.
  • the generalization forces a unit that generates XSL stylesheets (known as a XSL Generator) to account for this by applying templates to similar elements in order to treat them in the same way.
  • the paths in the ARML are generalized and then the ARML is used to create an XSL stylesheet which is then applied to an XHTML page to create RML.
  • ARML has paths which query the XHTML for content and RML contains the actual content which was accessed during the query.
  • pre-processing or "pre-processor” refers to a software module that takes the absolute paths stored at the leaves of the ARML and relativizes them based upon the hierarchy of groups and atomics in the ARML.
  • the generalization process is a much more complicated process in which the actual paths are changed, using XPath, so that they can match more than one node at a time (which helps in handling changing numbers of similar items on a page).
  • the generalization method 160 may involve a combination of user input and automatic computation.
  • the user selects an example of a type of group or atomic (groups and atomics are not explained)in step 162 that may dynamically change in number and then selects or de-selects other atomics or groups which are similar.
  • other atomics or groups can be automatically selected and created by adjusting the amount of content in the XHTML page which should be generalized.
  • the user may elect to remove certain particular elements from the new selected content, or to move further up or down the XHTML tree in step 164 to make the content selection larger or smaller.
  • the user views the selection, and either approves the change or provides more input.
  • a set of generalized atomics or groups is then defined.
  • the method is based on tree nodes wherein every element is associated with an RML node that also has a corresponding node in the XHTML tree. Every XHTML node is a parent to a sub-tree (the set of all descendant nodes of the parent node).
  • a sub-tree the set of all descendant nodes of the parent node.
  • step 166 it is determined if the user is selected more content and the method loops back to step 162 if more content is being generalized. If there is not more content to generalize, then a general XPATH expression for the generalization is computed in step 168. The method is then completed.
  • Figure 21 is a diagram illustrating an example of the generalization method in accordance with the invention.
  • a user selects an atomic representing a "D" node (shaded in the diagram) in the XHTML tree (See Section I of the diagram).
  • This "D" node has a "C” tag as its parent in the XHTML structure.
  • the user can move the atomic path up one level to the "C” tag thereby converting the atomic to a group where everything underneath the "C” tag becomes an atomic child (See Section II).
  • this method handles a change in number of children (See Section III) since the path is directed to "C” so that any number of atomics "D" underneath "C” will be retrieved. For example, if “C” represents the top stories in a news web page and “D” represents each top story, the addition of extra top stories into the web site will still be retrieved. In addition, the generalized node "C” may overlook newly inserted undesired children (See Section IN). A similar method may be used to handle the generalization of groups as well. The robustifier will now be described in more detail.
  • Figure 22 is a block diagram illustrating more details of the XPATH robustifier 142 in accordance with the invention.
  • the robustifier generalizes a path through a hierarchical structure so that, even if the underlying hierarchical structure or the content in the hierarchical structure changes, the content may still be located in the hierarchical structure during a search.
  • the robustifier may also be referred to as the XPATH robustifier.
  • the robustifier creates paths to selected content that remain valid even after new HTML or XML nodes have been inserted into the structure to be queried. This is accomplished by making XPath node selection as non-specific as possible.
  • the robustifier thus searches for information that is specific to the selected XHTML node, as compared to all similar nodes in an XHTML sub-tree, or the entire XHTML structure. By matching nodes according to this type of information, it creates less specific paths to the same unique set of content.
  • Figure 23 is a diagram illustrating an example of the paths to content stored in a dynamic structure which demonstrates the need for this approach.
  • two atomics 180,182, shown as shaded nodes in the XHTML tree have been selected for placement into a group.
  • the path to both atomics begins from the "A" node in the XHTML. After that point, the two paths diverge and the paths "B" and "CDEF” find the individual atomics (shown in Section I).
  • Section II the structure of the web page and hence the XHTML has changed and the "X" nodes 184 have been introduced into the structure due to the changing structure.
  • robustified path descriptions can still locate the selected content so that the dynamic web page does not interfere with the extraction of the content.
  • the robustified paths are "descendant: :B” and "descendant: :E/descendant: :F/.”
  • the XPATH robustifier may include one or more modules that are implemented using one or more software applications or modules.
  • the modules may include a comparer module 170, a turning-point node identifier module 172 and a module for determining whether or not a descendant axis can be used, also called an axis verifier 174.
  • These modules function to automatically determine robust paths to selected content so that, even if the structure of the hierarchical data storage changes such as with a dynamic web page, the appropriate content from the dynamic web page may be located.
  • the robustification process can be split up into several individual methods implemented by the modules described above.
  • the methods include: a comparison method for determining how similar a set of XHTML nodes are to each other to identify a node of interest, and in particular how a node containing desired content is different from other nodes of the same type in an XHTML tree or sub-tree; a method to find parent "turning-point" nodes of a desired content-containing node that help to improve the search, by effectively partitioning the page into smaller regions; and a method to determine whether or not a descendant:: axis can be placed in front of turning-point nodes.
  • the first assumption is that certain particular nodes, such as the turning-point nodes, the content-containing nodes and the relative group nodes, will not change type or be removed from the page.
  • a relative group node is the node in the XHTML tree having the smallest sub-tree that contains all of the children of an RML group node).
  • the second assumption is that these nodes will retain their original relationships in terms of what is a descendant of what in the tree.
  • the robustifier will create paths that will work for any other changes that may occur to the XHTML tree structure so that various dynamic web pages may have wireless pages generated from them in accordance with the invention.
  • a web site may have one or more news stories wherein each story is contained in a cell of a table. If the web site adds another story by creating another cell in the table, the generalized paths will locate and extract the extra story since the type of the atomic, a table, did not change.
  • this process may be made recursive.
  • the path from the relative group node to the turning point node is another sub-path that can then be robustified the same way.
  • predicates can be found for the turning point node and another turning point node can be found between the original turning point node and the relative group node. Then the process can be repeated for the new turning point node, and so on.
  • the entire robustification process is organized to prefer certain types of results.
  • the most preferred result is that the node can be identified by its local properties alone. This is tested first using the comparison method described below. If this fails, then the node is tested to see if it can be identified by local properties, given that it is within a specific sub-tree of the relative group's sub-tree where the root node of the specific sub-tree is a turning-point node. If that fails, then the future recursive version of the algorithm will attempt to find a specific sub-tree of the tree inside another specific sub-tree of the relative group's sub-tree in which the content can be uniquely identified, and so on and so on recursively by finding more than one turning point node. If all of this fails, then the only possibility is the full, specific path to the content. Now, the node comparison method in accordance with the invention will be described.
  • Figures 24A and 24B are a flowchart illustrating a node comparison method 190 in accordance with the invention for identifying a node of interest.
  • the robustifier and the comparison method must determine how to distinguish the desired content from content that is of a similar type, yet is undesired.
  • one of the "F" nodes is desired, while the other is not and it is important to determine how to distinguish the two nodes to ensure that robustification process extracts the proper node.
  • the robustifier selects and defines a "node of interest" which is what has been previously referred to as the desired node.
  • the node of interest is defined by a set of specifiers, which can be used in place of the specific path to distinguish the node of interest from all other "mismatch" nodes wherein the specifiers may include characteristics, such as which attributes or children the node has. These specifiers allow the robustifier to compare each potential mismatch in the parent node's sub-tree to the node of interest, and are described in more detail below.
  • the comparison method uses the following information about a node as a basis for its comparison: the siblings of the node, the descendants of the node, the direct children of the node, the attributes of the node and the position of the node among its siblings.
  • This set of information can easily be changed to include other information, such as its direct parent, the attributes of the child nodes, etc.
  • the actual set of information being used for the comparison is not necessarily a fundamental aspect of the method and may vary. All that is necessary for the method is that there is a set of information to be used as a basis of comparison, and it can easily be changed to accommodate varying needs.
  • the comparison method attempts to determine what is unique about the node we are interested in, which is often a node containing some content, but may also be a relative group node.
  • a particular class called an XHTMLInformer, is used to contain the information about a node, and this class is also used to make comparisons.
  • two XHTMLInformers can be intersected, just like sets in mathematics, returning an XHTMLInformer containing only that information that is shared between two nodes.
  • the other operations include differences and unions, which return what is in one node but not the other, or what is in either node, respectively.
  • Each XHTML node of interest is represented by an RML node in the current paradigm within which this invention is being described, and each RML node has a parent node (except for the root ⁇ rml> node).
  • the preprocessing method guarantees that the XHTML node represented by an RML node's parent is in fact an ancestor in the XHTML tree of that RML node's XHTML node.
  • the method begins by searching using the entire sub-tree of the XHTML node corresponding to the RML parent of the RML node of interest in step 192.
  • This subtree is traversed in step 194 to find all XHTML nodes that are of the same type of the XHTML node corresponding to the RML node of interest in step 196.
  • Each node that is found during the traversal is compared to the XHTML node of interest in step 198 by using a differencing operation.
  • the differencing operation finds, for each found node, what makes it different from the XHTML node of interest.
  • the result of the differencing is stored in an XHTMLInformer in step 200 that contains what is in the XHTML node of interest that is not in the other node.
  • the method determines if there are more located nodes in step 202 and loops back to step 198 to process the other located nodes. If all of the nodes are processed to generate the XHTMLInformers, the method continues.
  • the above process results in a list of XHTMLInformers, each of which contains information about the node of interest that makes it different from a particular other node in the tree. All of these XHTMLInformers are intersected in step 204, yielding a single set of information (stored in another XHTMLInformer) that describes what makes the node of interest different from all the other nodes. This is called the intersection test in step 206: if the XHTMLInformer at the end of this comparison is non-empty, there is something about the node of interest that makes it unique in step 208. This information may then be placed in a predicate in step 208 to specify this node uniquely in an XPath expression.
  • step 210 the entire path can be replaced with "descendant: :node[information]" in step 210, where the information is that which is contained in the XHTMLInformer at the end of the comparison at step 208 and the robustifier method is finished processing this particular node of interest.
  • intersection test fails in step 206, there is no single piece of information that is unique to the node of interest. However, there may be one piece of information distinguishing it from some of the nodes and another piece of information distinguishing it from the rest.
  • the differences can be the following four sets: [ABC],[BCD],[CDA],[DAB]. Note that the intersection of these four sets is the null set, but the union of the sets is [ABCD].
  • the union test step 212 and it only works if none of the differences are empty (if a difference is empty, it means that the node which was differenced with the node of interest is indistinguishable from the node of interest).
  • step 214 If the union test is successful (step 214), then the method loops to steps 208 and 210 in which the path is replaced.
  • the difference test or the union test is successful, not only have we determined that no relative path is necessary, but we've also determined what is specific about the node. That information can be directly used as a predicate in the XPath expression. If both tests fail, then a relative path must be used for the node of interest since no other more generic assignment uniquely identifies the node. Now, the turning-point node identification method in accordance with the invention will be described.
  • FIGs 25A and 25B are a flowchart illustrating a turning-point node identification method 220 in accordance with the invention.
  • the robustifier must use another technique than the comparison method to help identify the correct path to the selected content.
  • the finding of turning-point nodes is one of those methods of identification.
  • Turning point nodes are defined as nodes that have been identified as important components of the path to the content, or "vital turning points" in the tree.
  • the "F” node selection set is narrowed to the one desired "F” node. Therefore, the "E” node is a vital turning point in the tree in that it can be used to identify the desired node while avoiding the unwanted node.
  • a page 217 is presented as a matrix of tables.
  • tables use table row tags and table column tags to partition the page into regions where content can be placed.
  • the ability to identify content by a particular table row and column is essentially equivalent to identifying a particular node in the XHTML as a turning point.
  • the robustifier can locate selected content while only searching a particular region on the rendered page.
  • a turning-point node 218 and a turning-point group 219 are shown .
  • a turning-point node somewhere along the relative path may be located using the turning-point identification method.
  • the turning-point method makes the search space smaller and hopefully the possibility of finding something unique about the node of interest greater.
  • we began at the top of the sub-free at the XHTML node at the top of the relative path stored at the RML node of interest in step 222.
  • the sub-tree can be made smaller by moving down one level in the tree to the next parent, and then the entire process of performing intersection/union tests steps can be repeated as shown in steps 224- 244 that correspond to the steps in the comparison algorithm and will not be described herein.
  • step 246 If the intersection/union test suddenly succeeds on this smaller sub-tree, then that next parent is designated a turning-point node in step 246.
  • the turning-point node becomes important because it designates a smaller sub-tree within which the node of interest can be uniquely specified when it could't be uniquely specified from within a larger sub-tree. If no turning-point is found, i.e. the intersection/union tests never succeed in finding something specific about the node of interest from within any subtree, the method determines if there is another node in the path in step 247 and loops back to step 224 to process the next node in the path.
  • XPATH expressions allow more path information than simple node names, so axes and predicates can be used to help identify targeted content.
  • An axis defines where in the tree to look for content nodes, based on their relationship with the current node. Typical axes are "descendant::,” “sibling::,” “ancestor::,” and “parent::.” For example, if the path is "A B/sibling::C,” the path begins at “A”, moves to "B”, then looks at all of the siblings of "B” to find a "C.” In Figure 23, the current node is "A".
  • the "B” node can be found (in both Cases I and II) by the path "descendant: :B" wherein any "B" node underneath node "A” will be selected, whether it is a direct descendant (a child node) or further down the tree.
  • the path to the desired "F” node could be written “descendant: :E/descendant::F.” This path finds a descendant of the "A" node that is an "E” node, and then finds a descendant of the "E” node that is an "F” node. That is how the "turning point" (the "E” node) is implemented in XPath.
  • XPath predicates can be used to describe the desired node's properties. Each step along the XPath can therefore appear as
  • axis::node_name[predicate]. This is useful if, for example, an axis selects all current node descendants as a node-set, but the content must be identified more specifically.
  • a typical predicate will include attributes, among other things. For example, if the target "C" node has a "taco” attribute, the path could be "A/B/C[taco].” This predicate is an example of a specifier, as described above.
  • Both axes and predicates can robustify an XPath expression since it permits a node to be uniquely identified without the fully-specific path.
  • the entire path can be replaced with "descendant: :node[specif ⁇ cs]" where specifics is the predicate.
  • FIG. 27 is a flowchart illustrating an axis verification method 260 in accordance with the invention.
  • the descendant: : axis can be used to identify a turning point node if certain conditions are met. This allows structural change to occur between the turning point node and the relative group node. Note that the descendant:: axis can automatically be used from the turning point node to the content, since a predicate that uniquely specifies the content from the turning point node is a necessary condition of having a turning point node.
  • step 262 From the largest sub-tree identified in step 262, which is the subtree of the relative group node, all other nodes of the same type as the turning point node are found in step 264.
  • each node is checked to see if it has a descendant that matches the content node given the specific information about the content node. If any of the descendants match the current node, then the descendant- axis designation cannot be used since the descendant: : axis designation does not uniquely identify the current node and the method is completed.
  • step 268 it is determined if there are any more nodes and the method loops back to step 266 to test each additional nodes.
  • the descendant: axis information may be used for specifying a turning-point only if the above conditions are met.
  • multiple XHTML pages may be used as input to the robustifier wherein each page is marked up as the user wants and they are merged to form a single stylesheet that generates a single new page from the multiple XHTML pages.
  • several pages of the same type e.g., several eBay auction item pages
  • the wireless page generation system's goal is to create a single stylesheet that will correctly fransform all of those similar pages.
  • the XPATH robustifier helps make this happen since it tries to account for possible changes in the structure of the auction item pages from item to item.
  • the multiple pages provides additional information to the robustification process.
  • there may be a reverse robustification process in which an XSL stylesheet and several XHTML pages may be input. Based on the stylesheet and the several XHTML targets, the RML for each of the XHTML pages may be generated.
  • a user may manually tweak stylesheets generated by the wireless page generation system and then continue to make additional changes to the XSL from within the GUI environment.
  • Figure 28 is a diagram illustrating a producer adding an atomic to a new page using the integrated desktop GUI interface 50 in accordance with the invention.
  • a user may select a particular web page that is then moved into the project window as described above.
  • the user may then select the construction tab so that the selected web page is converted into XHTML.
  • the user may then select and adds atomics 280 to the page navigation portion 54 by highlighting the selected atomics as shown in Figure 28.
  • the user interface may pop up a menu so that the user may select either to add atomics to the root group, add atomics to menu or add atomics to features.
  • the selected atomics have been added to the root node in the page navigation portion 54 as the Intro node.
  • Figure 29a - 29c are diagrams illustrating the producer defining a ruleset in accordance with the invention.
  • the ruleset defines how the wireless page delivery system should transform the content and services from the desktop webpage into a wireless page. Since rulesets often apply to more than one URL, the URL manager permits the producer to define the appropriate for each URL request.
  • the user has selected a URL 290 to be associated with a stylesheet.
  • the producer may select an element 292 of the URL to be mapped to the stylesheet or select the element from a drop-down menu.
  • the producer may define the settings 294 associated with a particular URL element 296.
  • Figure 30 is a diagram illustrating the producer deploying the ruleset in accordance with the invention.
  • the producer may deploy the project to view the wireless pages on a phone or Palm emulator as shown in Figure 32a and 32b.
  • the deployment manager may send the XSL stylesheet to the wireless page delivery system so that the XSL stylesheet may be used to automatically process the appropriate web page.
  • Figure 31 is a diagram illustrating an example of an XSL stylesheet 300 in accordance with the invention.
  • the stylesheet may be used to automatically process a web page to generate a wireless page in accordance with the invention.
  • Figures 32a and 32b are diagrams illustrating an example of a new page on a cellular phone emulator 302 and on a Palm device emulator 304, respectively.
  • the emulators prior to deploying the XSL stylesheet to the wireless page delivery system, the emulators permit the producer to review the resultant wireless pages.
  • Figures 32a and 32b show the same web page shown in Figure 30 for a phone and then also for a Palm device. Note the differences between the two wireless pages shown since the Palm device is capable of displaying more information that the phone.

Abstract

A wireless page generation system (22) and method are provided wherein the user may specify the format of an HTML web page destined from a wireless device. The system can handle both static HTML web pages as well as dynamic web pages automatically. The system may include a Graphical User Interface (GUI) tool (24) that permits the user to interact with the system. The system may also include a robustifier that automatically processes pages in order to generate an XSL stylesheet that may extract content from dynamic web pages.

Description

SYSTEM AND METHOD FOR GENERATING A WIRELESS WEB PAGE
Background of the Invention
This invention relates generally to a system and method for permitting a user to analyze a document in order to break the document into a hierarchical structure and generate a new document and in particular to a system and method for permitting a user to analyze an information source, such as a hypertext markup language (HTML) web page, an XML document, an ICE document (a content syndication format) or Reuters, in order to generate one or more wireless web pages corresponding to the original information source.
Increasingly, it is desirable to be able to decompose a document or web page into its constituent parts. In particular, for a web page written in HTML or some other format, it is desirable to be able to view the web page and divide the web page up into one or more hierarchically related elements that may be known as atomics. An atomic is a small part or portion of the web page. The atomics may be clustered into groups which may in turn be clustered into bigger groups. For example, each top story on a web page may be an atomic while all of the stories together may be treated as a group.
The identification of the atomics in the web page and the hierarchical structure of the atomics is an important task. For example, a system may decompose the web page into the hierarchical atomics and then use the hierarchical atomics to restructure the web page for one or more different users wherein each user may request a slightly different parts of the web page or each user is using a device that can only handle certain pieces of the web page due to memory or screen size limitations. The deconstructed web page may also be used for a variety of other purposes. For a typical web page wherein the company would like to distribute the content from that web page to multiple different wireless devices, such as cellular phones, Palm devices, pagers and the like, a group of people must go back to a database of content and re-create each new page for each different type of wireless device which is a slow, time-consuming process. In addition, if the original web page changes, each of those new pages generated by the group of people must be regenerated. Thus, it is desirable to provide a system that permits a single user, such as the producer of a web page, to more easily deconstruct an information source, such as an HTML web page, an XML document, an ICE document (a content syndication format), a Reuters feed or the like, into its atomics to generate atomics, relate the atomics to each other and assign properties to the atomics so that newly formatted wireless pages for one or more different wireless devices may be automatically generated and it is to this end that the present invention is directed.
In addition, a web page is often dynamic in that the content in the web page may constantly change or be updated. For example, web pages containing information about the news, trading information, shopping information and the like must be continuously updated to reflect changes. When a web page is dynamic, the process of attempting to generate a new page having a particular format from the original web page is even more difficult. For example, if a designer develops a new page when the original news story web page had two top stories, that new page becomes obsolete as soon as the stories change or the number of top stories change because the page will not longer be accurate or up to date. Thus, the task of trying to manually generate new pages based on a dynamic web page is extraordinarily difficult and very time consuming since the new pages quickly are out of date. Thus, it is desirable to provide a system for generating the new pages based on the dynamic web page automatically or semi-automatically so that the designer does not need to continuously recreate the new pages as the dynamic web page changes and it is also to this end that the present invention is directed.
Summary of the Invention
A system and method for generating a wireless web page in accordance with the invention is provided wherein an information source, such as an HTML web page, an XML document, an ICE document (a content syndication format) or Reuters, may be automatically broken down into its constituent parts within a hierarchical format so that a new page having a different format or contents may be automatically generated based on the original information source. The invention is particularly useful in the context of re-purposing an HTML web page for one or more different wireless devices wherein each wireless device has different memory and screen size limitations that make it necessary to generate a differently formatted series of pages (known as cards) for each wireless device. The invention is also particularly useful for generating one or more different wireless pages from a dynamic web page for one or more different wireless devices having different screen sizes as described in more detail below.
The wireless web page generating system in accordance with the invention may be utilized by a producer of a web site who wishes to re-purpose the content from the web page to one or more different wireless devices wherein each different wireless device may have a different screen size so that each wireless web page must have a slightly different format. Using the system in accordance with the invention, the producer may, without help, re-purpose the web page for the multiple different wireless devices so that the wireless pages may be automatically generated by the wireless page delivery system. The system and method for generating a wireless web page in accordance with the invention, known as the Nomad™ Wireless Toolkit, may include a graphic user interface (GUI) portion. The system allows producers of web page using the Wireless Toolkit to specify how their website content should appear to wireless devices and to then directly communicate the specifications of the desired web page to an intelligent harvesting and navigation system and method so that the wireless pages may be generated. The system also permits the producer to preview their web site content on one or more wireless pages that emulate how it will appear on a wireless devices.
In a preferred embodiment, the system permits the producer (also referred to herein as the user) to rapidly process a web page to generate a hierarchical list of atomics contained in the page and to produce a resultant page which has some or all of the atomics from the original web page. For example, if the page is going to be sent to a wireless device with a limited memory or screen size, the web page producer must typically re-format the web page for display on the device with the more limited memory or screen size. The producer may also need to re-format the web page for many different wireless devices wherein each wireless device has a different size screen so that the page generated for each wireless device is unique. With the system in accordance with the invention, however, the producer may define each of the wireless pages for each of the wireless devices so that wireless pages may be automatically generated and wireless pages for dynamic web sites may also be automatically generated.
Thus, in accordance with the invention, an apparatus for processing an information source is provided wherein the apparatus retrieves an information source and extracts one or more elements from the information source wherein each element comprising a piece of content within the information source. The apparatus also generates a data structure that represents the hierarchical structure of the elements in the information source and processes the data structure in order to retrieve predetermined elements from the information source. In extracting the elements from the information source, the apparatus may include a page viewing portion for viewing the page from which elements are being extracted, a page navigator portion for viewing a hierarchical list of elements extracted from the page, a user dragging an element from the page viewing portion to the page navigator portion to extract the element from the page, and an element property portion for viewing the properties of an element in the list of the page navigator portion, the page viewing, page navigator and element property portions permitting the user to rapidly extract elements from the page by simultaneously viewing the page and the hierarchical list of elements.
In generating the data structure, the apparatus converts the information source into a first hierarchical structure containing the content and the hierarchical structure and then determines a generalized path to the element in the information source so that the element is located even if the information source changes. In more detail, the first hierarchical structure comprises one or more nodes each containing an element wherein a particular element is located in a first node of the hierarchical structure and the generalized path determiner comparing a first node containing the data to each other node in the hierarchical structure to identify a unique node identifier. The generalized path determiner also identifies a turning-point node associated with the first node if a unique identifier is not located during the comparison, the turning point node being a node of the hierarchical structure that uniquely identifies the first node, and specifies a descendants axis as a turning-point node if there are no descendants of the node that match the first node.
In accordance with another aspect of the invention, a graphical user interface for extracting one or more atomics from an HTML web page is provided that includes a page viewing portion for viewing the page from which atomics and groups of atomics are being extracted, a page navigator portion for viewing a hierarchical list of atomics extracted from the page wherein a user dragging an atomic from the page viewing portion to the page navigator portion to extract the atomic from the page, and an atomic property portion for viewing the properties of an atomic in the list of the page navigator portion. The page viewing, page navigator and element property portions permit the user to rapidly extract atomics from the page by simultaneously viewing the page and the hierarchical list of atomics.
In accordance with another aspect of the invention, a graphical user interface for extracting one or more elements from a HTML web page is provided that views a page from which atomics are being extracted, navigates the page by viewing a hierarchical list of atomics extracted from the page wherein the user drags an atomic from the page viewer to the page navigator to extract the atomic from the page, and an atomic property generator that extracts the properties from the atomic selected by the user so that the user views the page wherein the hierarchical list of atomics and the properties for a selected atomic simultaneously.
In accordance with yet another aspect of the invention, a method for processing a web page to re-purpose the web page for one or more wireless devices having different screen formats by determining paths to pieces of content in the web page is provided. The method generates a first hierarchical structure based on the web page wherein the first hierarchical structure comprising the structure of the web page and the content in the web page. The method then generates a second hierarchical structure of the web page from the first hierarchical structure wherein the second hierarchical structure comprising the structure of the web page wherein paths to the content are indicated. The method then generates relative paths to the content in the web page wherein the relative paths are inserted into the second hierarchical structure, and robustifies the paths in the second hierarchical structure so that a search for content using a path to the content locates the content even if the web page has changed.
Brief Description of the Drawings
Figure 1 is a block diagram illustrating a wireless page delivery system;
Figure 2 is a block diagram illustrating the wireless web page generation system in accordance with the invention;
Figure 3 is a diagram illustrating an example of a portion of a web page broken down into one or more atomics in accordance with the invention;
Figures 4a - 4c are diagrams illustrating an example of the user interface for the GUI tool and in particular the integrated desktop, HTML viewer and the source viewer, respectively, in accordance with the invention that is within the system shown in Figure 2;
Figure 5 is a diagram illustrating an example of a wireless navigation viewer in accordance with the invention that is within the system shown in Figure 2;
Figure 6 is a diagram illustrating an example of a project manager in accordance with the invention that is within the system shown in Figure 2;
Figure 7 is a diagram illustrating an example of a ruleset addition viewer in accordance with the invention that is within the system shown in Figure 2;
Figure 8 is a diagram illustrating an example of a URL definition manager in accordance with the invention that is within the system shown in Figure 2; Figures 9A and 9B are diagrams illustrating examples of a wireless features manager in accordance with the invention that is within the system shown in Figure 2;
Figure 10 is a diagram illustrating an example of a deployment manager in accordance with the invention that is within the system shown in Figure 2;
Figure 11 illustrates an example of a dynamic web page wherein the content in web page has changed between the two samples;
Figure 12 illustrates an example of the HTML tree of the two samples of web page of Figure 11;
Figure 13 illustrates an example of the relational mark-up language (RML) code for each sample of the web page of Figure 11;
Figure 14 illustrates the unprocessed agnostic RML code (ARML) for each sample of the web page of Figure 11;
Figure 15 illustrates the preprocessed and robustified agnostic RML code (ARML) for each sample of the web page of Figure 11;
Figure 16 illustrates the generalized agnostic RML code (ARML) that is capable of retrieving the appropriate content from either sample of the web page shown in Figure 11;
Figure 17 illustrates an example of the XSL stylesheet that correctly retrieves content from either of the web page samples in accordance with the invention;
Figures 18A and 18B are block diagrams illustrating two embodiments of the XSL generator in accordance with the invention; Figure 19 is a diagram illustrating an example of group nodes in accordance with the invention;
Figure 20 is a flowchart illustrating the generalizer method in accordance with the invention;
Figure 21 is a diagram illustrating an example of the generalization method in accordance with the invention;
Figure 22 is a block diagram illustrating more details of the XPATH robustifier in accordance with the invention;
Figure 23 is a diagram illustrating an example of the paths to dynamic content;
Figures 24A and 24B are a flowchart illustrating a node comparison method in accordance with the invention;
Figures 25A and 25B are a flowchart illustrating a turning-point node identification method in accordance with the invention;
Figure 26 is a diagrafn illustrating an example of the turning-point method in accordance with the invention;
Figure 27 is a flowchart illustrating a descendant identification method in accordance with the invention;
Figure 28 is a diagram illustrating a producer adding an atomic to a new page in accordance with the invention;
Figure 29a - 29c are diagrams illustrating the producer defining a ruleset in accordance with the invention; Figure 30 is a diagram illustrating the producer deploying the ruleset in accordance with the invention;
Figure 31 is a diagram illustrating an example of an XSL stylesheet in accordance with the invention; and
Figures 32a and 32b are diagrams illustrating an example of a new page on a cellular phone and on a Palm device, respectively.
Detailed Description of a Preferred Embodiment
The invention is particularly applicable to deconstructing a web page to generate one or more new wireless pages for one or more different wireless devices and it is in this context that the invention will be described. It will be appreciated, however, that the system and method in accordance with the invention has greater utility, such as to other types of information sources including but not limited to an XML document, an ICE document (a content syndication format) or a Reuters feed or any other type of information feed, where the information source may be deconstructed into one or more elements to generate new pages. To better understand the invention, a wireless page delivery system that may be used in conjunction with the wireless web page generator in accordance with the invention will be briefly described.
Figure 1 is a block diagram illustrating a wireless page delivery system 10 that may be used in conjunction with the wireless page generation system in accordance with the invention. A brief description of the system will be described herein. A more detailed description may be found in co-pending US Patent Application Serial No. 09/503,797 filed on February 14, 2000 which is owned by the same assignee as the present invention and which is incorporated herein by reference. The system 10 may include one or more content providers or information sources 11, such as companies that would like to be able to deliver their web pages from a web site to one or more different wireless devices wherein each wireless device may require the web page to be formatted in a particular manner due to the size of the screen of the wireless device, the memory of the wireless device or the communications link between the wireless device and the web site.
The system may also include a gateway 12, a web server 13, a wireless communications system 14 to the wireless device and a wireless web page delivery portion 15. The gateway may intercept an incoming HTTP request from a wireless device and route the request to the web server 13 and on to the wireless page delivery portion 15. The wireless page delivery portion 15 may retrieve the actual requested HTML page, reformat the page into one or more cards and decks for the particular wireless device and send the reformatted cards and decks to the wireless device using the web server 13 and the gateway 12.
To carry out the reformatting of the HTML page and other functions, the wireless page delivery portion 15 may further include an appliance connection handler 16, a content connection handler 17, an XML engine 18, a layout engine 19, a rules database 20 and an XSL ruleset database 21. Briefly, the system may receive the incoming HTML page request, retrieve the web page, reformat the HTML page into XHTML, generate an RML document from the XHTML document, format the elements from the RML document into one or more cards and decks to form a presentation shoe that is delivered to the wireless device. The interactions of the portions of the wireless page delivery system are shown in Figure 1 in more detail and further described in the above incorporated co-pending patent application. Therefore, the operation of the wireless page delivery system will not be described in any more detail. Now, the wireless web page generating system in accordance with the invention will be described. Figure 2 is a block diagram illustrating a wireless web page generation system 22 in accordance with the invention. Generally, the web page generation system permits a producer or company with a web site to control the look of its one or more web page when the web pages are downloaded to a wireless device as will be described in more detail below. The wireless web page generation system 22 may include a back-end portion 23 and a front-end portion 24. The front-end portion may also be referred to as a graphical user interface (GUI) tool. In a preferred embodiment of the invention, the back-end portion may include one or more compiled JAVA programs/modules that implement the functions of the back-end as described in more detail below and the front-end may be one or more Visual Basic modules/programs that implement the functions of the front-end (GUI Tool) as described in more detail below. The GUI tool and the back-end may be connected to each other using APIs as is well known.
In more detail, the back-end 23 may further include the web page delivery portion 15 shown in Figure 1, an RML builder module 25, an XSL generator module 26 and a stylesheets database 27. The function of each module will be described herein and a more detailed description of each module will be provided below. As described above, the web page delivery portion 15 may generate XHMTL. The RML builder module 25 may generate an RML document based on a generated ruleset as described in more detail in the incorporated co-pending patent application and output the RML document into the XSL generator 26 that generates an XSL stylesheet based on the RML document. The generated stylesheet may be stored in the database 27. The XSL stylesheet may be used to automatically generate one or more cards from a web page so that the web page may be downloaded and displayed on a wireless device.
The GUI tool 24 may further include a ruleset construction toolset 28, a ruleset database 29, a project construction toolset 30 and a wireless website projects database 31. The Graphical User Interface (GUI) tool enables the user to interact with the application. In particular, using the GUI tool, the user can perform content selections, configuration and deployment for their wireless website project including defining the one or more cards that contain the content of the web site. In a preferred embodiment, the GUT has the look and feel of standard MS Windows-type application, and confoπns to MS Windows applications standards.
The ruleset construction toolset 28 may permit the user to create and define ralesets. A ruleset expresses how the wireless page delivery system 15 should transform the content and services from a desktop-centric webpage into one or more cards destined for a wireless device such as the new formatting for the cards and which content goes on which card. In more detail, a ruleset may also define which URLs use a particular ruleset. The ruleset may also include an XSL stylesheet that specifies how the web page is transformed into one or more wireless pages. Using the ruleset construction toolset, a user can:
1. Create, open, and save rulesets;
2. Select a desktop-centric webpage on which to base a ruleset;
3. Configure a ruleset (select and group the content and services for a web page for wireless delivery);
4. Integrate specialized wireless features into the ruleset;
5. Graphically view the Wireless Navigation Structure of the ruleset; and
6. Deploy the ruleset for testing purposes using a wireless device emulator or an internet-enabled wireless device. The ruleset construction toolset 28 may receive the XHTML document representing a web page from the web delivery portion 15 and generate one or more rulesets based on the XHTML that may be stored in the database 29. The one or more rulesets, as described below in more detail, determine how the HTML web page will look on the wireless devices when the web page is converted into the wireless web page. The rulesets in the database 29 may be sent to the RML builder 25 that generates the RML document and it may also be sent to the project construction toolset 30 that generates the wireless website projects for the incoming web pages as described below. The finished projects are stored in the database 31.
In operation, a producer may interact with the GUI tool to generate a wireless website project which includes information about the look of the HTML web page on the one or more wireless devices. When the producer or user selects a web page, the wireless delivery portion 15 may retrieve that web page and generate an XHMTL document corresponding to the web page. Using the ruleset construction toolset, the user may extract or automatically extract one or more elements from the web page as described below with reference to Figures 3 and 4. From the extracted elements, known as atomics hereinafter, the user may generate the look of the wireless pages and review the wireless pages. Once the user is satisfied with the wireless pages, one or more rulesets are generated that capture the information about the look of the wireless pages so that the wireless page delivery system 15 (See Figure 1), when it receives a request for a web page, automatically generate the appropriate one or more cards for the wireless device based on the generated rulesets and stylesheets. Thus, once the user defines the rulesets and stylesheets, the wireless page delivery system automatically generates the wireless pages in accordance with the stylesheets.
Using the generated rulesets, the RML builder module 25 and the XSL generator module 26 may generate an RML document and then generate an XSL stylesheet that reflects the producer's requirements as embodied in the rulesets and the RML document. The ruleset may also be used to generate project information that may be combined with the XSL stylesheet to generate a wireless website project that may then be deployed using the wireless web page delivery system as shown in Figure 1. Using the wireless page generation system, the user may specify the format of its web pages on the wireless devices.
The ruleset construction toolset 28 may further include a page viewer module as described with reference to Figures 4a - c and a wireless navigation viewer module as described below with reference to Figure 5. The project construction toolset 30 may further include a project manager module as described below with reference to Figures 6 and 7, a URL definition manager module as described below with reference to Figure 8, a wireless feature manager module as described below with reference to Figures 9A and 9B,a deployment manager module as described below with reference to Figure 10 and a emulator module that is described below with reference to Figures 32a and 32b. Each of the these modules will now be described in more detail.
Figure 3 is a diagram illustrating an example of a portion of a web page 40 broken down into one or more atomics and groups of atomics in accordance with the invention. In particular, an example of a portion of an HTML web page 40 from an on-line stock brokerage company. In the Figure, the innermost dashed boxes designate atomics while the boxes enclosing them constitute groups of atomics. An atomic is the basic building block of the web page, such as a word, a quote for a stock, the name of the stock, headlines, paragraphs, links, images, and other basic elements of a page that form the most fundamental elements of the web page. The system may automatically identify the atomics on a web page based on the XHTML document. A group is defined as a user-defined set 'of atomics that can be nested into a hierarchical construction. These logical, hierarchical sets define the navigation experience for the user, and are the key to building wireless websites.
For example, at the top of the page 40 is a quote look-up form 41. The quote look-up form 41 is made up of a groups of three atomics, a "Quotes" title portion 41a, an entry box 41b and a "Go" submission button 41c. Further, a market graph 42a, table 42b, and Fool.com advertisement 42c are each related atomics and are grouped together to constitute group 42. In addition, each element in the market graph 42a may also be an atomic so that "NASDAQ" is an atomic, "2756.27" is an atomic, the down arrow is an atomic and "-5.48" is an atomic.
Finally, the TheStreet.com logo 43a, and the news stories 43b-c are each related atomics and are grouped together as group 43. All of the groups 41, 42, 43 make up the root group 40. These groups constitute the relational hierarchy for this portion of the E-TRADE website. Thus, any web page may be broken down into atomics and groups of atomics so that the atomics and groups of atomics may be reconstituted into a new web page having a different subset of atomics or a different format or the like. Now, the page viewer module of the ruleset construction toolset will be described in more detail.
The breaking down of an information source, such as a web page or the like, into its atomics may have a variety of different uses. In a preferred embodiment, the breaking down of the web pages into atomics permits the web page to be re-purposed for display on one or more different wireless devices having different screen sizes. In particular, since the web page is already broken down into individual atomics, it is possible to automatically or manually assign those atomics to one or more wireless pages for one or more different wireless devices so that the wireless pages may be generated. Figures 4a - 4c are diagrams illustrating an example of an integrated desktop user interface 50 for the GUI tool in accordance with the invention that is within the system shown in Figure 2. In the example shown, the user interface is designed to permit the user to view all of the tools in the ruleset construction toolset simultaneously as shown. In particular, the user interface 50 may include a page viewer portion 52, a page navigator portion 54, an atomic property portion 56, and one or more tabs 58 that changes the item being viewed in the page viewer portion 52. In the example shown, the tabs may include an HTML tab 60 that permits the user to view the HTML code, a construction tab 62 that permits the user to view the graphical page as shown in Figure 3, and a source tab 64 that permits the user to view the source of the page.
The page viewer 52 is the primary work environment for ruleset construction. It displays the desktop-centric webpages that users wish to configure for wireless delivery. Users select elements (atomics or groups of atomics) from the targeted webpage with a mouse, and then set properties for each element. The page viewer permits the user to view the page from which atomics are being extracted or the page that is being created using the extracted atomics in the different item view modes controlled by the tabs. In Figure A, an auction page for eBay is shown. Now, the atomic extraction process will be briefly described.
In order to extract the atomics from a web page, that may be HTML, a modified HTML editor, such as Microsoft® DHTML, may be used. The typical HTML editor may display the HTML as it is typically rendered, but doesn't allow click-throughs to links and also does not allow selection of particular HTML subtrees. For instance, if a piece of text is selected, the typical HTML editor tells the user which node in the HTML tree contains all the selected text. The modified HTML editor in accordance with the invention , however, has been modified to pass back the HTML node with the selected content based on the user click. The atomic extractor then digs into the underlying tree node structure to determine the path to the selected element by iterating through it's valid parents (since the control throws in a number of design time tags which are not existent outside the editor and there are some validations we threw in the mix here as well). The path to the selected content represented by the atomic is the absolute path to that atomic and is used later to describe the location of the particular content.
The page navigator 54 is very similar to the construction view of the page viewer and can initiate all of the construction viewer's functionality. However, the page navigator 54 displays the selected page elements in a tree structure rather than a graphical format. The tree structure is a hierarchical representation of how the selected elements are structured into groups and atomics. Thus, the page navigator portion 54 illustrates the hierarchical structure of the page. In particular, the page is represented as groups and atomics arranged in the hierarchical relationship as shown. The hierarchical relationship of the atomics and groups may be manually generated by the user by dragging an element from the page shown in page viewer portion 52 and dropping it into the navigator portion 54. In addition, the system may automatically extract the atomics and groups from the web page as described above. The user may then arrange the atomics and groups in the navigator portion 54 to accurately reflect the page.
As an atomic in the navigator portion 54 is selected, the properties of that atomic may be displayed in the atomic property portion 56. The atomic property portion 56 allows the user to view and configure properties for each element in the wireless website. The properties may express certain attributes for each element (for example, a property might define which classes of wireless devices may display the element). In the example shown, the properties of the atomic may include its name, its class, its RML path, its HTML path, its tag, a sample of the atomic and any other information. In accordance with the invention, the GUI tool interface permits the user to view all of the above portions simultaneously as shown in Figure 4 so that the user does not have to switch between screens or applications to see all of the necessary information to deconstruct the page and develop a new page.
Figure 4b illustrate the HTML viewer 60 wherein the graphical page generated based on the HTML code is shown. Figure 4c illustrate the source viewer 64 wherein the actual HTML code that generates the graphical page is shown. Now, the wireless navigation viewer module in accordance with the invention that is part of the GUI tool will be described in more detail.
Figure 5 is a diagram illustrating an example of a wireless navigation viewer module 70 in accordance with the invention within the system shown in Figure 2. The wireless navigation viewer 70 allows the user to graphically view the wireless navigational structure of their one or more cards generated based on the ruleset that was created using the ruleset construction toolset as described above. In particular, due to the screen limitations of most wireless devices, a single webpage is typically delivered to a wireless device in a series of presentation decks. These decks contain one or more cards wherein each card contains webpage content that has been formatted appropriately for the screen of each wireless device. Thus, the wireless navigation viewer module presents these decks in graphical form to the user so that the user can ensure that the web page has been divided into the cards in an appropriate manner. Thus, the user may review the results of the ruleset generated using the ruleset construction toolset.
In the example shown in Figure 5, the web page has been broken down by the user's ruleset into a first card 72 containing an auction item that is linked to three other cards 74, 76, 78 containing bid statistics, seller information and a description of the item, respectively. The bid statistics card 74 may have a bid card 79 and a how to bid card 80 linked to it. The description card 78 may have an image card 82 linked to it that shows an image of the item. Thus, for a user with a wireless device, the user may navigate through the cards as shown in Figure 5. Thus, the user of the system is able to ensure that the cards are generated in a logical fashion so that the navigation through the cards is logical. If the user notes a problem with the navigation, he/she may return to the page viewer portion 50 and redo the ruleset to correct the problem. Now, the project construction toolkit in accordance with the invention will be described in more detail.
The project construction toolset 30 allows users to combine one or more rulesets into a wireless website project (WWP). Using the project construction toolset, users can:
1. Structure a WWP by adding and removing rulesets (See Figure 7);
2. Create and maintain a WWP's URL Definition Table (expresses how the rulesets apply to various webpages) (See Figure 8);
3. Integrate specialized wireless features with the WWP (See Figures 9A and 9B);
4. Deploy the WWP for testing purposes using a wireless device emulator or an internet-enabled wireless device; and
5. Deploy the WWP from a test environment into a production environment (See Figure 10). As described above, the project construction toolset further comprises a project manager module, a URL definition manager module, a wireless feature manager module, a deployment manager module and an emulator module. Each of these modules will now be described.
Figure 6 is a diagram illustrating an example of a user interface 90 for the project manager in accordance with the invention. The project manager allows the user to maintain the set of rulesets that make up a WWP. Using the project manager, users can modify the WWP by adding or removing previously created rulesets as described below with reference to Figure 7. As shown in Figure 6, for the particular project, a list 92 of one or more different rulesets are shown. For each ruleset, the user that created the ruleset, its last update date, its status (deployed or modified) and the URL rule are listed. In addition, the user interface 90 may include one or more buttons 94 that permit the user to, for example, add a ruleset, deploy a ruleset, define a URL, finish the project management or cancel the prior command. Now, an example of the ruleset addition user interface will be described. An example of a ruleset and the wireless pages generated based on the ruleset and the XSL stylesheet will be described below with reference to Figures 22- 25b.
Figure 7 is a diagram illustrating an example of a ruleset addition viewer user interface 100 in accordance with the invention. The ruleset addition user interface permits the user to add a ruleset, previously created using the ruleset construction toolset, to a particular project. As shown in Figure 7, the user may select from one or more existing rulesets (StartPage.rs, Categories. rs, IteniList.rs and ItemDescription.rs in this example) and add them into the project. In the example shown, the user has selected the ItemDescription.rs ruleset which defines the cards and decks and formats used to present the description of an item to the user. Now, the URL definition manager in accordance with the invention will be described. Figure 8 is a diagram illustrating an example of a URL definition manager user interface 110 in accordance with the invention. The URL definition manager allows the user to maintain the URL definition tables for the project. The URL definition tables allow the wireless page delivery system 15 shown in Figure 2 to select the appropriate ruleset for each URL request. In particular, since rulesets often apply to more than one URL; this module enables the user to define how URLs are mapped to rulesets. For example, as shown in Figure 8, the ItemDescription.rs has been selected and the URL definition manager lists one or more tokens 112 that are used to identify whether the particular ruleset applies to the particular URL. In addition, a circle 114 may be filled in for each token indicating whether the particular token is required to be matching to use the ruleset ("Must Have"), whether the token in the URL is not relevant ("Don't Care") or whether the token cannot exist in a URL for the ruleset ("Must Not Have". In the example shown, a URL must have the following tokens: eBay; com; aw-cgi; and eBayISAPI.dll?NiewItem&Item in order to invoke the ruleset. Thus, for example, a URL that is "http://www.elBay.com..." would not invoke the ruleset since the URL does not contain eBay. Now, the wireless features manager in accordance with the invention will be described.
Figures 9 A and 9B are diagrams illustrating examples of user interfaces 120, 122 for the wireless features manager in accordance with the invention. The wireless features manager is used to include certain specialized wireless features that are not available from desktop-centric website content. In particular, the wireless features manager allows the user to integrate these features into their wireless website. For example, the user can include specialized messaging features into the WWP or revenue features as will be briefly described as an example of the wireless features that may be added. Figures 9A and 9B illustrate an example for including an E-Revenue feature to the wireless page. In Figure 9A, the wireless feature manager permits the user to add E-Commerce features (including secure sockets and integrated payments) or to add promotions features (including wireless advertising, couponing and sponsorships). In Figure 9B, the user interface 122 for adding the wireless advertising is shown in more detail. The user may specify, for example, the advertising partner and that partner's URL along with information about how often the advertising is going to be shown to the wireless device user. As shown, one or more wireless devices may be chosen for the advertising and each different wireless device may have a different level of advertising. For example, some wireless devices, such as Internet Phones, Handhelds, are selected and the advertisement may be repeated on every deck presented to the user as opposed to a WAP phone where the advertisement is shown once per session. Thus, the user may customize the features associated with the wireless page using the wireless features manager. Now, the deployment manager will be described in more detail.
Figure 10 is a diagram illustrating an example of a deployment manager user interface 130 in accordance with the invention. The deployment manager allows the user to control the deployment of the WWP. For example, the users can use the deployment manager to deploy the WWP either to a testing environment or to a production environment. The deployment manager also includes deployment version control, which allows users to return to previous versions as necessary. Figure 10 shows an example of the deployment manager user interface showing the versions for a project called "whatever. nmd". Now, the back-end of the wireless page generating system in accordance with the invention will be described in more detail.
As described above, the back-end includes the RML builder 25, the XSL generator 26 and the stylesheets database 27 as shown in Figure 2. Each of these modules of the back-end will now be described in more detail. The RML builder 25 stores and updates agnostic RML documents based on the rulesets generated by the GUI. These documents contain user-specified project data, which has been gathered via the GUT. The data includes not only user-designated groups and atomics, but also user-defined attributes and additional rules for handling content, agnostic RML documents group data hierarchically which mirrors the structure of the RML documents which are used as input to the Layout Engine. Now, an example of the process for converting a web page (dynamic in this example) into RML code, then into robustified agnostic RML code and finally into a XSL stylesheet that will extract the desired content from the dynamic web page will be described.
Figure 11 illustrates an example of a dynamic web page wherein the content in two samples 131, 132 of the web page changes. The web page may be dynamic since it changes from the first sample 131 to the second sample 132 which makes it more difficult to accurately extract content as described below. The first sample 131 may include a group 133 and one or more atomics 134 in the group. In particular, the atomics may be "Yin" and "Yang". As shown, the second sample 132 may also have a group 133 and atomics 134. However, in the second sample, the number of atomics in the group 133 has increased by one so that the atomics are "Yin", "Yang" and "Dragon". The number of atomics in the other part of the web page has also been increased by one. For example, the first sample may be the web page at a first predetermined time and the second sample may be the same web page at a second predetermined time when the content has changed. As described above, in a typical system, the people generating the wireless web pages must redo all of the wireless web pages when the web page changes as shown in Figure 11. In accordance with the invention, however, whether the web page looks like the first sample 131 or the second sample 132, the appropriate content from the web page may be extracted. The web page is structurally represented by its HTML tree as shown in Figure 12.Figure 12 illustrates an example of the HTML trees corresponding to the web page samples of Figure 11. In particular, the hierarchical structure of each sample of the web page is shown and the differences between the two samples are evident. In more detail, the upper parts of the HTML tree are identical since the content and structure of the web page is identical. In the HTML tree for the first sample, a table structure is a parent of a TR group node 133 which is the parent of two TD nodes 134 that contain the atomics "Yin" and "Yang". In contrast, the HTML tree for the second sample has the same table parent node and the same TR group node 133, but there are three TD nodes that contain the three atomics, "Yin", "Yang" and 'Dragon". Now, an example of the RML code generated by on the two samples of the web page will be described.
Figure 13 illustrates an example of the relational mark-up language (RML) code 135 generated based on the web page samples of Figure 11. In particular, the RML code may be known as orthodox RML since the code contains the content in contrast to agnostic RML (ARML) code that contains only queries into the structure of the web page, but has the same relational hierarchy as the orthodox RML. The RML code in the current paradigm is typically constructed by applying an XSL stylesheet to the HTML page to convert the HTML to the RML format through the selection of particular pieces of desired content. As shown, the RML code for each sample is different since the second sample contains an extra atomic as described above. A group tag is generated for each group 133 and an atomic tag is generated for each atomic 134. Thus, the RML captures the desired content of the web page and contains it in a hierarchical structure of atomics and groups of atomics.
Figure 14 illustrates an example of unprocessed agnostic RML code (ARML) 136 for each sample. In particular, these represent the ARML before any processing, where each atomic contains a fully-specific path. Next, the ARML code is preprocessed, then robustified and generalized as described below with reference to Figures 15 and 16. A sample of the robustified ARML code 137 for each sample of the web page is shown in Figure 15. Thus, the paths to the desired content are robustified so that even if the web page is dynamic and changes (such as the change between the first sample 131 and the second sample 132 shown in Figure 11), the XSL stylesheet shown in Figure 17 will extract the correct content from the web page. In this example, all of the content in the table should be extracted whether there are two or three pieces of content. Figure 16 illustrates the generalized ARML code in accordance with the invention. Figure 17 illustrates an example of the XSL stylesheet 139 generated based on the ARML code of Figure 16 that correctly processes both of the web page samples in accordance with the invention. Now, two embodiments of the XSL generator 26 will be described in more detail.
Figures 18A and 18B are block diagrams illustrating more details of two embodiments of the XSL generator 26 in accordance with the invention. In particular, the XSL generator receives the agnostic RML from the RML builder and uses it to construct an XSL stylesheet. The XSL stylesheet is used by the wireless page delivery system to automatically generate one or more cards and decks based on the HTML web page. The XSL engine in the wireless page deliver system 15 (known as Catalyst™) will later refer to this stylesheet when it converts XHTML to orthodox RML for reformatting a web page into wireless pages for wireless devices. In accordance with one embodiment of the invention, the XSL generator may automatically create the stylesheet based on the user input as stated above. In a preferred embodiment, the XSL generator may further include an XPATH robustifier as described below that automatically parses through the XPATHs in the RML document and attempts to generalize the paths so that the XSL stylesheets may still extract the proper atomics or groups even when the original web page changes. For example, if a web site typically has its top stories in a table with cells for each story, then, when the web site adds an extra top story into the table, the XPATH robustifier has modified the original stylesheet so that the stylesheet still retrieves the correct content as described in more detail below even with the extra top story. Thus, even dynamic web pages with changing content may be automatically processed in accordance with the invention to properly generate one or more wireless pages or cards.
In one embodiment shown in Figure 18 A, which is the preferred embodiment, the XSL generator 26 may include one or more modules that may be software applications being executed by a CPU on a server. The XSL generator 26 may include an XPATH preprocessor module 140, an XPATH robustifier module 142 and an XSL writer module 144. In the embodiment shown in Figure 18B, the XPATH robustifier 142 is removed. Without the XPATH robustifier, the system generates XSL stylesheets that will extract the appropriate information from a web site, but may not be able to extract the proper information if the web site is dynamic. Using the XPATH robustifier in accordance with the preferred embodiment of the invention , a dynamic web page may also be properly processed using the stylesheets because the XPATH robustifier attempts to generalize the XPATHs in the stylesheet so that changes in the web page do not disrupt the stylesheet as described below in more detail. Each module in the XSL generator 26 will now be described in more detail.
The role of the XPATH preprocessor 140 is to define the relative paths to selected content in the RML. An XHTML tree uses an absolute path to the selected content which records every node between the root node and the selected content. However, using relative paths (wherein less than every node from the root node to the selected content needs to be recorded) generalizes the stylesheet considerably. In addition, the relative paths may be used by the XPATH robustifier for the generalization method that is described below with reference to Figures 20 and 21. In more detail, to define the relative paths, the XPATH preprocessor first determines the group nodes for selected content. The group nodes are the lowest parent nodes linked to all of the grouped selected content. The Preprocessor then defines the paths between the group nodes and their descendants which is the relative path. Figure 19 is a diagram illustrating three examples of identifying the group nodes in accordance with the invention. As shown, an XHTML tree 150 is shown with the absolute path to the selected content 152 (shown as shaded in the diagram) along with an RML tree 154 including a group node 156 and the selected content 152. In the first example, the selected content 152 have the absolute paths "ABDG" and "ABDH" in the XHTML tree and a group node "ABD" is located which is the parent of the both pieces of selected content so that the relative path becomes from the group node "ABD" to each selected content (Atomic G and Atomic H). In the second example, selected content E and I have absolute paths "ABDI" and "ABE" so that the group node that contains the parents of the selected content is "AB" and the RML tree 154 contains the group node and the two selected content (Atomic DI and Atomic E) as shown. Similarly, in the third example, the selected content E and J has absolute paths "ABE" and "ACFJ" so that the group node that contains the parents of the selected content is "A" and the RML tree 154 contains the group node and the two selected content nodes (Atomic BE and Atomic CFJ) as shown. Thus, the XPATH preprocessor attempts to simplify the path to the selected content by determining the lowest group node in the tree that is the parent for the selected content.
The XSL writer module 144 creates an XSL stylesheet from the robustified agnostic RML. It does this by constructing a template for each node in the tree (if generalized, some nodes may point to several different pieces of content in the tree). An element handler writes the code for each template. If nodes contain content, the element handler must also implement user-defined methods for handling the content in a specific way. For example, image display may be handled in several ways:
Images appear on enabled devices, otherwise displayed as ALT-tag text;
Images appear on enabled devices, no ALT-tag used on other devices; or
Images appear as ALT-tag text on all devices
If the user chooses to display images as ALT-tag text on all devices, then the element handler creates code that implements that decision in XSL. These selections are considered auxiliary (that is, non-attribute) preferences. Now, the generalizing method that is carried out in the XPATH robustifier will be described.
Figure 20 is a flowchart illustrating the generalizer method 160 in accordance with the invention. The generalizer method is one example of a technique in accordance with the invention for generalizing the paths to the selected content. Another technique is embodied in the robustifier as described below. Both of the techniques may or may not be used with the wireless page generation system. Both techniques may be used to help the wireless page generation system handle dynamic web pages in which the content may change. Generalization is the process of applying the content selection and formatting of one element to other, similar elements. Generalization takes into account that elements targeted for generalization may occur an arbitrary number of times within an XHTML page. The generalization forces a unit that generates XSL stylesheets (known as a XSL Generator) to account for this by applying templates to similar elements in order to treat them in the same way. In more detail, the paths in the ARML are generalized and then the ARML is used to create an XSL stylesheet which is then applied to an XHTML page to create RML. For purposes of this document, ARML has paths which query the XHTML for content and RML contains the actual content which was accessed during the query. For purposes of this disclosure, the term "pre-processing" or "pre-processor" refers to a software module that takes the absolute paths stored at the leaves of the ARML and relativizes them based upon the hierarchy of groups and atomics in the ARML. The generalization process is a much more complicated process in which the actual paths are changed, using XPath, so that they can match more than one node at a time (which helps in handling changing numbers of similar items on a page).
The generalization method 160 may involve a combination of user input and automatic computation. In this method, the user selects an example of a type of group or atomic (groups and atomics are not explained)in step 162 that may dynamically change in number and then selects or de-selects other atomics or groups which are similar. Also, other atomics or groups can be automatically selected and created by adjusting the amount of content in the XHTML page which should be generalized. For example, the user may elect to remove certain particular elements from the new selected content, or to move further up or down the XHTML tree in step 164 to make the content selection larger or smaller. The user views the selection, and either approves the change or provides more input. From the final amount of selected content, a set of generalized atomics or groups is then defined. The method is based on tree nodes wherein every element is associated with an RML node that also has a corresponding node in the XHTML tree. Every XHTML node is a parent to a sub-tree (the set of all descendant nodes of the parent node). By using navigational buttons to move up and down the XHTML tree in step 164, the user selects a wider or narrower XHTML sub-tree to represent the RML node. An XHTML node that is higher up in the tree is parent to a larger sub-tree, and includes more content. In step 166, it is determined if the user is selected more content and the method loops back to step 162 if more content is being generalized. If there is not more content to generalize, then a general XPATH expression for the generalization is computed in step 168. The method is then completed.
Figure 21 is a diagram illustrating an example of the generalization method in accordance with the invention. In particular, suppose a user selects an atomic representing a "D" node (shaded in the diagram) in the XHTML tree (See Section I of the diagram). This "D" node has a "C" tag as its parent in the XHTML structure. By pushing the "up-arrow" button, the user can move the atomic path up one level to the "C" tag thereby converting the atomic to a group where everything underneath the "C" tag becomes an atomic child (See Section II). If the "C" tag has several "D" tags underneath it, all the "D" tags will be converted into atomics and be marked as "generalized." In accordance with the invention, this method handles a change in number of children (See Section III) since the path is directed to "C" so that any number of atomics "D" underneath "C" will be retrieved. For example, if "C" represents the top stories in a news web page and "D" represents each top story, the addition of extra top stories into the web site will still be retrieved. In addition, the generalized node "C" may overlook newly inserted undesired children (See Section IN). A similar method may be used to handle the generalization of groups as well. The robustifier will now be described in more detail.
Figure 22 is a block diagram illustrating more details of the XPATH robustifier 142 in accordance with the invention. In general, the robustifier generalizes a path through a hierarchical structure so that, even if the underlying hierarchical structure or the content in the hierarchical structure changes, the content may still be located in the hierarchical structure during a search. In the context of XSL code generation for queries into HTML or XML structures, the robustifier may also be referred to as the XPATH robustifier. Thus, the robustifier creates paths to selected content that remain valid even after new HTML or XML nodes have been inserted into the structure to be queried. This is accomplished by making XPath node selection as non-specific as possible. The robustifier thus searches for information that is specific to the selected XHTML node, as compared to all similar nodes in an XHTML sub-tree, or the entire XHTML structure. By matching nodes according to this type of information, it creates less specific paths to the same unique set of content.
Figure 23 is a diagram illustrating an example of the paths to content stored in a dynamic structure which demonstrates the need for this approach. In this example, two atomics 180,182, shown as shaded nodes in the XHTML tree, have been selected for placement into a group. The path to both atomics begins from the "A" node in the XHTML. After that point, the two paths diverge and the paths "B" and "CDEF" find the individual atomics (shown in Section I). In Section II, the structure of the web page and hence the XHTML has changed and the "X" nodes 184 have been introduced into the structure due to the changing structure. While the traditional "B" and "CDEF" paths are no longer valid, robustified path descriptions can still locate the selected content so that the dynamic web page does not interfere with the extraction of the content. In this example, the robustified paths are "descendant: :B" and "descendant: :E/descendant: :F/."
Returning to Figure 22, the XPATH robustifier may include one or more modules that are implemented using one or more software applications or modules. The modules may include a comparer module 170, a turning-point node identifier module 172 and a module for determining whether or not a descendant axis can be used, also called an axis verifier 174. These modules function to automatically determine robust paths to selected content so that, even if the structure of the hierarchical data storage changes such as with a dynamic web page, the appropriate content from the dynamic web page may be located. The robustification process can be split up into several individual methods implemented by the modules described above. In general, the methods include: a comparison method for determining how similar a set of XHTML nodes are to each other to identify a node of interest, and in particular how a node containing desired content is different from other nodes of the same type in an XHTML tree or sub-tree; a method to find parent "turning-point" nodes of a desired content-containing node that help to improve the search, by effectively partitioning the page into smaller regions; and a method to determine whether or not a descendant:: axis can be placed in front of turning-point nodes.
Each of these methods will be described in more detail below. For purposes of the robustification process, several assumptions are made about how the structure of an XHTML document will change due to a dynamic web page. The first assumption is that certain particular nodes, such as the turning-point nodes, the content-containing nodes and the relative group nodes, will not change type or be removed from the page. (A relative group node is the node in the XHTML tree having the smallest sub-tree that contains all of the children of an RML group node). The second assumption is that these nodes will retain their original relationships in terms of what is a descendant of what in the tree. The final assumption is that only one XHTML page is given as input to the process, however other incarnations of the method can include multiple XHTML pages as additional input to help determine the validity of information used during the robustification and generalization processes Given those assumptions, the robustifier will create paths that will work for any other changes that may occur to the XHTML tree structure so that various dynamic web pages may have wireless pages generated from them in accordance with the invention. As an example, a web site may have one or more news stories wherein each story is contained in a cell of a table. If the web site adds another story by creating another cell in the table, the generalized paths will locate and extract the extra story since the type of the atomic, a table, did not change.
In other embodiments of the robustifier, this process may be made recursive. In the preferred embodiment, there is only the possibility of one turning point node. However, once a turning point node is found, the path from the relative group node to the turning point node is another sub-path that can then be robustified the same way. In addition, predicates can be found for the turning point node and another turning point node can be found between the original turning point node and the relative group node. Then the process can be repeated for the new turning point node, and so on. By allowing this kind of recursion, there is a better likelihood that the full specific path to the content will never need to be used except in rare situations.
The entire robustification process is organized to prefer certain types of results. The most preferred result is that the node can be identified by its local properties alone. This is tested first using the comparison method described below. If this fails, then the node is tested to see if it can be identified by local properties, given that it is within a specific sub-tree of the relative group's sub-tree where the root node of the specific sub-tree is a turning-point node. If that fails, then the future recursive version of the algorithm will attempt to find a specific sub-tree of the tree inside another specific sub-tree of the relative group's sub-tree in which the content can be uniquely identified, and so on and so on recursively by finding more than one turning point node. If all of this fails, then the only possibility is the full, specific path to the content. Now, the node comparison method in accordance with the invention will be described.
Figures 24A and 24B are a flowchart illustrating a node comparison method 190 in accordance with the invention for identifying a node of interest. In particular, the robustifier and the comparison method must determine how to distinguish the desired content from content that is of a similar type, yet is undesired. For example, in Figure 23, one of the "F" nodes is desired, while the other is not and it is important to determine how to distinguish the two nodes to ensure that robustification process extracts the proper node. Thus, to distinguish between desired and undesired content, the robustifier selects and defines a "node of interest" which is what has been previously referred to as the desired node. The node of interest is defined by a set of specifiers, which can be used in place of the specific path to distinguish the node of interest from all other "mismatch" nodes wherein the specifiers may include characteristics, such as which attributes or children the node has. These specifiers allow the robustifier to compare each potential mismatch in the parent node's sub-tree to the node of interest, and are described in more detail below.
The comparison method uses the following information about a node as a basis for its comparison: the siblings of the node, the descendants of the node, the direct children of the node, the attributes of the node and the position of the node among its siblings. This set of information can easily be changed to include other information, such as its direct parent, the attributes of the child nodes, etc. The actual set of information being used for the comparison is not necessarily a fundamental aspect of the method and may vary. All that is necessary for the method is that there is a set of information to be used as a basis of comparison, and it can easily be changed to accommodate varying needs.
In general, the comparison method attempts to determine what is unique about the node we are interested in, which is often a node containing some content, but may also be a relative group node. A particular class, called an XHTMLInformer, is used to contain the information about a node, and this class is also used to make comparisons. For example, two XHTMLInformers can be intersected, just like sets in mathematics, returning an XHTMLInformer containing only that information that is shared between two nodes. The other operations include differences and unions, which return what is in one node but not the other, or what is in either node, respectively.
Each XHTML node of interest is represented by an RML node in the current paradigm within which this invention is being described, and each RML node has a parent node (except for the root <rml> node). The preprocessing method guarantees that the XHTML node represented by an RML node's parent is in fact an ancestor in the XHTML tree of that RML node's XHTML node.
The method begins by searching using the entire sub-tree of the XHTML node corresponding to the RML parent of the RML node of interest in step 192. This subtree is traversed in step 194 to find all XHTML nodes that are of the same type of the XHTML node corresponding to the RML node of interest in step 196. Each node that is found during the traversal is compared to the XHTML node of interest in step 198 by using a differencing operation. The differencing operation finds, for each found node, what makes it different from the XHTML node of interest. The result of the differencing is stored in an XHTMLInformer in step 200 that contains what is in the XHTML node of interest that is not in the other node. The method then determines if there are more located nodes in step 202 and loops back to step 198 to process the other located nodes. If all of the nodes are processed to generate the XHTMLInformers, the method continues.
The above process results in a list of XHTMLInformers, each of which contains information about the node of interest that makes it different from a particular other node in the tree. All of these XHTMLInformers are intersected in step 204, yielding a single set of information (stored in another XHTMLInformer) that describes what makes the node of interest different from all the other nodes. This is called the intersection test in step 206: if the XHTMLInformer at the end of this comparison is non-empty, there is something about the node of interest that makes it unique in step 208. This information may then be placed in a predicate in step 208 to specify this node uniquely in an XPath expression. Thus, if this test succeeds, the entire path can be replaced with "descendant: :node[information]" in step 210, where the information is that which is contained in the XHTMLInformer at the end of the comparison at step 208 and the robustifier method is finished processing this particular node of interest.
If the intersection test fails in step 206, there is no single piece of information that is unique to the node of interest. However, there may be one piece of information distinguishing it from some of the nodes and another piece of information distinguishing it from the rest. As an example, imagine that there are four nodes that are differenced with the node of interest. The differences can be the following four sets: [ABC],[BCD],[CDA],[DAB]. Note that the intersection of these four sets is the null set, but the union of the sets is [ABCD]. By noting that the current node has all of [ABCD] while none of the other nodes have all of these information pieces, the set [ABCD] can be used to uniquely specify the node. This is called the union test (step 212) and it only works if none of the differences are empty (if a difference is empty, it means that the node which was differenced with the node of interest is indistinguishable from the node of interest).
If the union test is successful (step 214), then the method loops to steps 208 and 210 in which the path is replaced. In summary, if the difference test or the union test is successful, not only have we determined that no relative path is necessary, but we've also determined what is specific about the node. That information can be directly used as a predicate in the XPath expression. If both tests fail, then a relative path must be used for the node of interest since no other more generic assignment uniquely identifies the node. Now, the turning-point node identification method in accordance with the invention will be described.
Figures 25A and 25B are a flowchart illustrating a turning-point node identification method 220 in accordance with the invention. In particular, if the specifier list for a node is too large or too small, then the robustifier must use another technique than the comparison method to help identify the correct path to the selected content. The finding of turning-point nodes is one of those methods of identification. Turning point nodes are defined as nodes that have been identified as important components of the path to the content, or "vital turning points" in the tree. For example, in Figure 23, there are two "F" nodes with no specifiers to distinguish one from the other. However, if the path must pass through an "E" node, the "F" node selection set is narrowed to the one desired "F" node. Therefore, the "E" node is a vital turning point in the tree in that it can be used to identify the desired node while avoiding the unwanted node.
By identifying the turning point nodes, entire regions of potential mismatches on the page can be ruled out. For example, as shown in Figure 26, a page 217 is presented as a matrix of tables. In XHTML, tables use table row tags and table column tags to partition the page into regions where content can be placed. The ability to identify content by a particular table row and column is essentially equivalent to identifying a particular node in the XHTML as a turning point. By focusing on the turning point's sub-free, the robustifier can locate selected content while only searching a particular region on the rendered page. Thus, as shown in Figure 26, a turning-point node 218 and a turning-point group 219 are shown .
Returning to Figures 25 A and 25B, if the union test of the comparison method fails, then a turning-point node somewhere along the relative path may be located using the turning-point identification method. The turning-point method makes the search space smaller and hopefully the possibility of finding something unique about the node of interest greater. In the method, we began at the top of the sub-free, at the XHTML node at the top of the relative path stored at the RML node of interest in step 222. The sub-tree can be made smaller by moving down one level in the tree to the next parent, and then the entire process of performing intersection/union tests steps can be repeated as shown in steps 224- 244 that correspond to the steps in the comparison algorithm and will not be described herein.
If the intersection/union test suddenly succeeds on this smaller sub-tree, then that next parent is designated a turning-point node in step 246. The turning-point node becomes important because it designates a smaller sub-tree within which the node of interest can be uniquely specified when it couldn't be uniquely specified from within a larger sub-tree. If no turning-point is found, i.e. the intersection/union tests never succeed in finding something specific about the node of interest from within any subtree, the method determines if there is another node in the path in step 247 and loops back to step 224 to process the next node in the path. If no turning-point node is found in the path to the desired node, the robustification process fails and the fully-specific path to the node of interest must be used in step 248. Now, a technique for identifying whether a descendant: : axis can be validly applied to the XPath expression for a turning-point node will be described.
XPATH expressions allow more path information than simple node names, so axes and predicates can be used to help identify targeted content. An axis defines where in the tree to look for content nodes, based on their relationship with the current node. Typical axes are "descendant::," "sibling::," "ancestor::," and "parent::." For example, if the path is "A B/sibling::C," the path begins at "A", moves to "B", then looks at all of the siblings of "B" to find a "C." In Figure 23, the current node is "A". The "B" node can be found (in both Cases I and II) by the path "descendant: :B" wherein any "B" node underneath node "A" will be selected, whether it is a direct descendant (a child node) or further down the tree. Similarly, the path to the desired "F" node could be written "descendant: :E/descendant::F." This path finds a descendant of the "A" node that is an "E" node, and then finds a descendant of the "E" node that is an "F" node. That is how the "turning point" (the "E" node) is implemented in XPath.
In addition to the axes described above, there may also be predicates. In particular, one or more XPath predicates can be used to describe the desired node's properties. Each step along the XPath can therefore appear as
"axis::node_name[predicate]." This is useful if, for example, an axis selects all current node descendants as a node-set, but the content must be identified more specifically. A typical predicate will include attributes, among other things. For example, if the target "C" node has a "taco" attribute, the path could be "A/B/C[taco]." This predicate is an example of a specifier, as described above.
Both axes and predicates can robustify an XPath expression since it permits a node to be uniquely identified without the fully-specific path. In fact, if the content node of interest has an adequate number of specifiers to identify it within the largest sub-tree, the entire path can be replaced with "descendant: :node[specifιcs]" where specifics is the predicate. For example, if the target "F" node has a single attribute called "pizza" and the other "F" node had a single attribute called "burger," we could distinguish them without the use of the "E" node: a robust path from the "A" node would be "descendant: :F[pizza]." The predicate is used to specify the node of interest. These predicates may also be used for uniquely determining turning point nodes. Figure 27 is a flowchart illustrating an axis verification method 260 in accordance with the invention. In particular, the descendant: : axis can be used to identify a turning point node if certain conditions are met. This allows structural change to occur between the turning point node and the relative group node. Note that the descendant:: axis can automatically be used from the turning point node to the content, since a predicate that uniquely specifies the content from the turning point node is a necessary condition of having a turning point node.
In the method, from the largest sub-tree identified in step 262, which is the subtree of the relative group node, all other nodes of the same type as the turning point node are found in step 264. In step 266, each node is checked to see if it has a descendant that matches the content node given the specific information about the content node. If any of the descendants match the current node, then the descendant- axis designation cannot be used since the descendant: : axis designation does not uniquely identify the current node and the method is completed. In step 268, it is determined if there are any more nodes and the method loops back to step 266 to test each additional nodes. If, after testing all of the nodes, none of those nodes has such a descendant, then it is safe to use the descendant: : axis for the turning point node from the relative group node in step 270. Thus, the descendant:: axis information may be used for specifying a turning-point only if the above conditions are met.
In accordance with another embodiment of the robustifier, multiple XHTML pages may be used as input to the robustifier wherein each page is marked up as the user wants and they are merged to form a single stylesheet that generates a single new page from the multiple XHTML pages. In particular, several pages of the same type (e.g., several eBay auction item pages) may be used. For example, suppose that the user wants to put all of eBay's auction items onto a wireless device. The wireless page generation system's goal is to create a single stylesheet that will correctly fransform all of those similar pages. Originally, the user had to generate a single example auction item page and the stylesheet produced should then work on all auction item pages. The XPATH robustifier helps make this happen since it tries to account for possible changes in the structure of the auction item pages from item to item.
However, now suppose the user is capable of defining how they want the stylesheet to behave on several pages of the same type. Each of these pages are qualitatively similar, however their tree structures in XHTML may be slightly different. The differences in structure among several example pages is, in effect, a sampling of the ways in which a page might actually change. This sampling provides additional clues to the robustifier, so that it won't have to blindly guess how pages might change, but actually has examples of change available as extra information. To better understand the multiple pages robustification in accordance with the invention, several examples will be provided.
First, suppose that the robustifier would ordinarily select a particular node "E" in a tree as a turning-point node. However, by looking at the other trees, the robustification process notices that the "E" node is not always present. This forces the robustifier to disqualify the "E" node as a turning-point node and to select another turning-point node instead. Thus, the multiple pages provides the robustification process with additional information about how to robustify the RML code. Second, suppose the robustifier finds that a particular attribute "pizza" helps to uniquely identify a node of interest. Then suppose that in at least one of the other examples, "pizza" is not present as an attribute of the same node of interest. That disqualifies "pizza" as a valid identifier for that node of interest and forces the robustifier to either look at other attributes or find a turning-point node to narrow the search. Again, the multiple pages provides additional information to the robustification process. In accordance with yet another embodiment of the invention, there may be a reverse robustification process in which an XSL stylesheet and several XHTML pages may be input. Based on the stylesheet and the several XHTML targets, the RML for each of the XHTML pages may be generated. Using the reverse robustification process, a user may manually tweak stylesheets generated by the wireless page generation system and then continue to make additional changes to the XSL from within the GUI environment. Now, an example of the wireless page generation process in accordance with the invention will be described.
Figure 28 is a diagram illustrating a producer adding an atomic to a new page using the integrated desktop GUI interface 50 in accordance with the invention. In particular, a user may select a particular web page that is then moved into the project window as described above. The user may then select the construction tab so that the selected web page is converted into XHTML. The user may then select and adds atomics 280 to the page navigation portion 54 by highlighting the selected atomics as shown in Figure 28. The user interface may pop up a menu so that the user may select either to add atomics to the root group, add atomics to menu or add atomics to features. In the example shown, the selected atomics have been added to the root node in the page navigation portion 54 as the Intro node.
Figure 29a - 29c are diagrams illustrating the producer defining a ruleset in accordance with the invention. The ruleset defines how the wireless page delivery system should transform the content and services from the desktop webpage into a wireless page. Since rulesets often apply to more than one URL, the URL manager permits the producer to define the appropriate for each URL request. As shown in Figure 29a, the user has selected a URL 290 to be associated with a stylesheet. As shown in Figure 29b, the producer may select an element 292 of the URL to be mapped to the stylesheet or select the element from a drop-down menu. As shown in Figure 29c, the producer may define the settings 294 associated with a particular URL element 296. In the example shown, the URL element "this=that" must be in the URL and the value of the elements (e.g., this=that) is important so that a URL with an element that is "this=other" will not be processed using the particular ruleset and stylesheet.
Figure 30 is a diagram illustrating the producer deploying the ruleset in accordance with the invention. In particular, once the navigation tree in the navigation portion 54 is constructed, the producer may deploy the project to view the wireless pages on a phone or Palm emulator as shown in Figure 32a and 32b. The deployment manager may send the XSL stylesheet to the wireless page delivery system so that the XSL stylesheet may be used to automatically process the appropriate web page. Figure 31 is a diagram illustrating an example of an XSL stylesheet 300 in accordance with the invention. The stylesheet may be used to automatically process a web page to generate a wireless page in accordance with the invention.
Figures 32a and 32b are diagrams illustrating an example of a new page on a cellular phone emulator 302 and on a Palm device emulator 304, respectively. In particular, prior to deploying the XSL stylesheet to the wireless page delivery system, the emulators permit the producer to review the resultant wireless pages. Figures 32a and 32b show the same web page shown in Figure 30 for a phone and then also for a Palm device. Note the differences between the two wireless pages shown since the Palm device is capable of displaying more information that the phone.
While the foregoing has been with reference to a particular embodiment of the invention, it will be appreciated by those skilled in the art that changes in this embodiment may be made without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims. For example, the system described herein may be used to process information from a variety of different information sources including an XML document, an ICE document (a content syndication format) or Reuters feed.

Claims

Claims:
1. An apparatus for processing an information source, comprising:
means for retrieving an information source;
means for extracting one or more elements from the information source, each element comprising a piece of content within the information source;
means for generating a data structure that represents the hierarchical structure of the elements in the information source; and
means for processing the data structure in order to retrieve predetermined elements from the information source.
2. The apparatus of Claim 1, wherein the extracting means further comprises a page viewing portion for viewing the page from which elements are being extracted, a page navigator portion for viewing a hierarchical list of elements extracted from the page, a user dragging an element from the page viewing portion to the page navigator portion to extract the element from the page, and an element property portion for viewing the properties of an element in the list of the page navigator portion, the page viewing, page navigator and element property portions permitting the user to rapidly extract elements from the page by simultaneously viewing the page and the hierarchical list of elements.
3. The apparatus of Claim 2, wherein the page comprises an HTML web page and the elements further comprise atomics and groups of atomics.
4. The apparatus of Claim 3 further comprising an HTML viewing portion showing the HTML code of the web page, a construction portion showing the graphical construction of the web page and a source portion showing the source code of the web page.
5. The apparatus of Claim 1, wherein data stracture generating means comprises means for converting the information source into a first hierarchical structure containing the content and the hierarchical structure and means for determining a generalized path to the element in the information source so that the element is located even if the information source changes.
6. The apparatus of Claim 5, wherein the first hierarchical structure comprises at one or more nodes each containing an element wherein a particular element is located in a first node of the hierarchical structure and wherein the generalized path determiner comprises means for comparing the desired node containing the data to each other node in the hierarchical stracture to construct a unique node identifier, means for identifying a turning-point node associated with the desired node if a unique identifier is not located during the comparison, the turning point node being a node of the hierarchical stracture that uniquely identifies the desired node, and means for discovering if applying a descendants axis to a turning-point node is valid, which occurs if there are no descendants of the node that match the desired node.
7. The apparatus of Claim 6, wherein the comparing means further comprises means for identifying nodes of the same type as the desired node in a subtree of the hierarchical structure which has at its root node the first node of the fully specific path to the desired node, means for generating a comparison of each identified node with the desired node to determine what set of pieces of node information may be used to uniquely identify the desired node from the identified nodes, means for determining the actual pieces of node information that are used to uniquely identify the desired node.
8. The apparatus of Claim 7, wherein the determining means further comprises means for determining the intersection and union of the set of pieces of node information in order to determine the actual pieces of node information that are used to uniquely identify the desired node.
9. The apparatus of Claim 6, wherein the turning-point identifying means further comprises means for identifying nodes of the same type as the desired node in a sub-tree of the hierarchical structure which has at its root node the first node of the fully specific path to the desired node, means for generating a comparison of each identified node with the desired node to determine what set of pieces of node information may be used to uniquely identify the desired node from the identified nodes, means for determining the actual pieces of node information that are used to uniquely identify the desired node.
10. The apparatus of Claim 9, wherein the determining means further comprises means for determining the intersection and union of the set of pieces of node information in order to determine the actual pieces of node information that are used to uniquely identify the desired node.
11. The apparatus of Claim 10, wherein the axis discovery means further comprises means for identifying the largest sub-tree in the hierarchical structure with a node from the relative path as the root that contains the first node, means for identifying all nodes with the same type as the turning-point node, means for determining if the descendants of each identified node match the descendants of the turning-point node and means for validly and safely assigning the descendant axis to the turning-point node if no descendants of the identified node match the descendants of the turning-point node.
12. The apparatus of Claim 5, wherein the hierarchical stracture comprises a tree stracture associated with a web page and wherein the data in the desired node comprises a piece of content associated with the web page.
13. The apparatus of Claim 5, wherein the generalized path determining means for traversing the hierarchical stracture in order to determine a generalized path identifier through the hierarchical stracture to the first node.
14. The apparatus of Claim 1, wherein the information source comprises a web page in one or more of HTML and XML formats and wherein the hierarchical stracture comprises relational markup language.
15. The apparatus of Claim 14, wherein the processing means comprises an XSL stylesheet.
16. A method for processing an information source, comprising:
retrieving an information source;
extracting one or more elements from the information source, each element comprising a piece of content within the information source;
generating a data stracture that represents the hierarchical stracture of the elements in the information source; and
processing the data structure in order to retrieve predetermined elements from the information source.
17. The method of Claim.16, wherein the extracting further comprises viewing the page from which elements are being exfracted, viewing a hierarchical list of elements extracted from the page, a user dragging an element from the page viewing portion to the page navigator portion to extract the element from the page, and viewing the properties of an element in the list of the page navigator portion, the page viewing, page navigator and element property portions permitting the user to rapidly exfract elements from the page by simultaneously viewing the page and the hierarchical list of elements.
18. The method of Claim 17, wherein the page comprises an HTML web page and the elements further comprise atomics and groups of atomics.
19. The method of Claim 18 further comprising showing the HTML code of the web page, showing the graphical construction of the web page and showing the source code of the web page.
20. The method of Claim 16, wherein data stracture generating comprises converting the information source into a first hierarchical stracture containing the content and the hierarchical stracture and determining a generalized path to the element in the information source so that the element is located even if the information source changes.
21. The method of Claim 20, wherein the first hierarchical structure comprises at one or more nodes each containing an element wherein a particular element is located in a first node of the hierarchical stracture and wherein the generalized path determiner comprises comparing the desired node containing the data to each other node in the hierarchical stracture to construct a unique node identifier, identifying a turning-point node associated with the desired node if a unique identifier is not located during the comparison, the turning point node being a node of the hierarchical stracture that uniquely identifies the desired node, and discovering if applying a descendants axis to a turning-point node is valid, which occurs if there are no descendants of the node that match the desired node.
22. The method of Claim 21 , wherein the comparing further comprises identifying nodes of the same type as the desired node in a sub-tree of the hierarchical stracture which has at its root node the first node of the fully specific path to the desired node, generating a comparison of each identified node with the desired node to determine what set of pieces of node information may be used to uniquely identify the desired node from the identified nodes and determining the actual pieces of node information that are used to uniquely identify the desired node.
23. The method of Claim 22, wherein the determining further comprises determining the intersection and union of the set of pieces of node information in order to determine the actual pieces of node information that are used to uniquely identify the desired node.
24. The method of Claim 21, wherein the turning-point identifying further comprises identifying nodes of the same type as the desired node in a sub-tree of the hierarchical structure which has at its root node the first node of the fully specific path to the desired node, generating a comparison of each identified node with the desired node to determine what set of pieces of node information may be used to uniquely identify the desired node from the identified nodes and determining the actual pieces of node information that are used to uniquely identify the desired node.
25. The method of Claim 24, wherein the determining further comprises determining the intersection and union of the set of pieces of node information in order to determine the actual pieces of node information that are used to uniquely identify the desired node.
26. The method of Claim 25, wherein the axis discovery further comprises identifying the largest sub-tree in the hierarchical stracture with a node from the relative path as the root that contains the first node, identifying all nodes with the same type as the turning-point node, means for determining if the descendants of each identified node match the descendants of the turning-point node and validly and safely assigning the descendant axis to the turning-point node if no descendants of the identified node match the descendants of the turning-point node.
27. The method of Claim 20, wherein the hierarchical structure comprises a free structure associated with a web page and wherein the data in the desired node comprises a piece of content associated with the web page.
28. The method of Claim 20, wherein the generalized path determining comprises traversing the hierarchical stracture in order to determine a generalized path identifier through the hierarchical stracture to the first node.
29. The method of Claim 16, wherein the information source comprises a web page in one or more of HTML and XML formats and wherein the hierarchical structure comprises relational markup language.
30. The method of Claim 29, wherein the processing comprises generating an XSL stylesheet.
31. A graphical user interface for extracting one or more atomics from an HTML web page, comprising:
a page viewing portion for viewing the page from which atomics and groups of atomics are being extracted;
a page navigator portion for viewing a hierarchical list of atomics extracted from the page, a user dragging an atomic from the page viewing portion to the page navigator portion to extract the atomic from the page; an atomic property portion for viewing the properties of an atomic in the list of the page navigator portion, the page viewing, page navigator and element property portions permitting the user to rapidly extract atomics from the page by simultaneously viewing the page and the hierarchical list of atomics.
32. A graphical user interface for extracting one or more elements from a HTML web page, comprising:
means for viewing a page from which atomics are being extracted;
means for navigating the page comprising means for viewing a hierarchical list of atomics extracted from the page wherein the user drags an atomic from the page viewing means to the page navigator means to extract the atomic from the page; and
an atomic property generator means for extracting the properties from the atomic selected by the user so that the user views the page, the hierarchical list of atomics and the properties for a selected atomic simultaneously.
33. A method for generating a hierarchical representation of a web page, the hierarchical representation having atomics and groups of atomics, the method comprising:
selecting a graphical representation of an atomic from the page being viewed by the user;
dragging the graphical representation of the atomic to a page navigator portion so that the atomic is shown in a hierarchical relationship to other atomics in the page; and automatically exfracting the properties of the atomic from the atomic when selected by the user so that the user may view the properties of the atomic.
34. A method for processing a web page to re-purpose the web page for one or more wireless devices having different screen formats by determining paths to pieces of content in the web page, comprising:
generating a first hierarchical stracture based on the web page, the first hierarchical stracture comprising the stracture of the web page and the content in the web page;
generating a second hierarchical stracture of the web page from the first hierarchical structure, the second hierarchical stracture comprising the stracture of the web page wherein paths to the content are indicated;
generating relative paths to the content in the web page wherein the relative paths are inserted into the second hierarchical stracture; and
robustifying the paths in the second hierarchical stracture so that a search for content using a path to the content locates the content even if the web page has changed.
35. An apparatus for processing an information source, comprising:
means for retrieving an information source;
means for extracting one or more elements from the information source, each element comprising a piece of content within the information source, wherein the extracting means further comprises a page viewing portion for viewing the page from which elements are being extracted, a page navigator portion for viewing a hierarchical list of elements extracted from the page, a user dragging an element from the page viewing portion to the page navigator portion to extract the element from the page, and an element property portion for viewing the properties of an element in the list of the page navigator portion, the page viewing, page navigator and element property portions permitting the user to rapidly exfract elements from the page by simultaneously viewing the page and the hierarchical list of elements;
means for generating a data stracture that represents the hierarchical stracture of the elements in the information source, wherein the data stracture generating means comprises means for converting the information source into a first hierarchical stracture containing the content and the hierarchical stracture and means for determining a generalized path to the element in the infom ation source so that the element is located even if the information source changes; and
wherein the first hierarchical stracture comprises at one or more nodes each containing an element wherein a particular element is located in a first node of the hierarchical stracture and wherein the generalized path determiner comprises means for comparing a first node containing the data to each other node in the hierarchical stracture to identify a unique node identifier, means for identifying a turning-point node associated with the first node if a unique identifier is not located during the comparison, the turning point node being a node of the hierarchical stracture that uniquely identifies the first node, and means for specifying a descendants axis as a turning-point node if there are no descendants of the node that match the first node.
36. The apparatus of Claim 35, wherein the page comprises an HTML web page and the elements further comprise atomics and groups of atomics.
37. The apparatus of Claim 36 further comprising an HTML viewing portion showing the HTML code of the web page, a construction portion showing the graphical construction of the web page and a source portion showing the source code of the web page.
38. The apparatus of Claim 35, wherein the determiner comprises means for comparing the desired node containing the data to each other node in the hierarchical stracture to construct a unique node identifier, means for identifying a turning-point node associated with the desired node if a unique identifier is not located during the comparison, the turning point node being a node of the hierarchical stracture that uniquely identifies the desired node, and means for discovering if applying a descendants axis to a turning-point node is valid, which occurs if there are no descendants of the node that match the desired node.
39. The apparatus of Claim 38, wherein the comparing means further comprises means for identifying nodes of the same type as the desired node in a subtree of the hierarchical stracture which has at its root node the first node of the fully specific path to the desired node, means for generating a comparison of each identified node with the desired node to determine what set of pieces of node information may be used to uniquely identify the desired node from the identified nodes, means for detennining the actual pieces of node information that are used to uniquely identify the desired node.
40. The apparatus of Claim 39, wherein the determining means further comprises means for determining the intersection and union of the set of pieces of node information in order to determine the actual pieces of node information that are used to uniquely identify the desired node.
41. The apparatus of Claim 38, wherein the turning-point identifying means further comprises means for identifying nodes of the same type as the desired node in a sub-tree of the hierarchical structure which has at its root node the first node of the fully specific path to the desired node, means for generating a comparison of each identified node with the desired node to determine what set of pieces of node information may be used to uniquely identify the desired node from the identified nodes, means for determining the actual pieces of node information that are used to uniquely identify the desired node.
42. The apparatus of Claim 41 , wherein the determining means further comprises means for determining the intersection and union of the set of pieces of node information in order to determine the actual pieces of node information that are used to uniquely identify the desired node.
43. The apparatus of Claim 42, wherein the axis discovery means further comprises means for identifying the largest sub-tree in the hierarchical stracture with a node from the relative path as the root that contains the first node, means for identifying all nodes with the same type as the turning-point node, means for determining if the descendants of each identified node match the descendants of the turning-point node and means for validly and safely assigning the descendant axis to the turning-point node if no descendants of the identified node match the descendants of the turning-point node. .
PCT/US2001/016576 2000-05-22 2001-05-22 System and method for generating a wireless web page WO2001090873A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001264810A AU2001264810A1 (en) 2000-05-22 2001-05-22 System and method for generating a wireless web page

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US57670300A 2000-05-22 2000-05-22
US09/576,703 2000-05-22

Publications (1)

Publication Number Publication Date
WO2001090873A1 true WO2001090873A1 (en) 2001-11-29

Family

ID=24305611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/016576 WO2001090873A1 (en) 2000-05-22 2001-05-22 System and method for generating a wireless web page

Country Status (3)

Country Link
JP (1) JP2002024227A (en)
AU (1) AU2001264810A1 (en)
WO (1) WO2001090873A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2382174A (en) * 2001-11-20 2003-05-21 Hewlett Packard Co Data formatting in a platform independent manner
WO2002076058A3 (en) * 2001-12-20 2003-09-18 Research In Motion Ltd Method and apparatus for providing content to media devices
WO2003105027A1 (en) * 2002-06-07 2003-12-18 Net Clue Corporation Improved web browser
EP1542140A2 (en) * 2003-12-13 2005-06-15 Samsung Electronics Co., Ltd. Method and apparatus for managing data written in markup language
EP1376408A3 (en) * 2002-06-28 2005-10-12 Nippon Telegraph and Telephone Corporation Extraction of information from structured documents
EP1681643A1 (en) * 2005-01-14 2006-07-19 FatLens, Inc. Method and system for information extraction
EP1681644A1 (en) * 2005-01-14 2006-07-19 FatLens, Inc. Method and system to identify records that relate to a predefined context in a data set
US7213200B2 (en) * 2002-04-23 2007-05-01 International Business Machines Corporation Selectable methods for generating robust XPath expressions
WO2008035044A2 (en) * 2006-09-18 2008-03-27 Yann Emmanuel Motte Methods and apparatus for selection of information and web page generation
US7770106B2 (en) 2006-03-17 2010-08-03 Microsoft Corporation Dynamic generation of compliant style sheets from non-compliant style sheets
US8135801B2 (en) 2002-06-18 2012-03-13 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US8166054B2 (en) 2008-05-29 2012-04-24 International Business Machines Corporation System and method for adaptively locating dynamic web page elements
US8949461B2 (en) 2001-12-20 2015-02-03 Blackberry Limited Method and apparatus for providing content to media devices
US9524506B2 (en) 2011-10-21 2016-12-20 Bigmachines, Inc. Methods and apparatus for maintaining business rules in a configuration system
CN112035722A (en) * 2020-08-04 2020-12-04 北京启明星辰信息安全技术有限公司 Method and device for extracting dynamic webpage information and computer readable storage medium
US20230028620A1 (en) * 2021-07-21 2023-01-26 Yext, Inc. Streaming static web page generation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4722697B2 (en) * 2005-12-26 2011-07-13 株式会社日立ソリューションズ Information display system
KR102639324B1 (en) * 2023-10-31 2024-02-21 (주)플랜아이 Web service construction automation system and method, web service provision method using the same

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5870559A (en) * 1996-10-15 1999-02-09 Mercury Interactive Software system and associated methods for facilitating the analysis and management of web sites
US5958008A (en) * 1996-10-15 1999-09-28 Mercury Interactive Corporation Software system and associated methods for scanning and mapping dynamically-generated web documents
US6035330A (en) * 1996-03-29 2000-03-07 British Telecommunications World wide web navigational mapping system and method
US6144962A (en) * 1996-10-15 2000-11-07 Mercury Interactive Corporation Visualization of web sites and hierarchical data structures
US6199098B1 (en) * 1996-02-23 2001-03-06 Silicon Graphics, Inc. Method and apparatus for providing an expandable, hierarchical index in a hypertextual, client-server environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6199098B1 (en) * 1996-02-23 2001-03-06 Silicon Graphics, Inc. Method and apparatus for providing an expandable, hierarchical index in a hypertextual, client-server environment
US6035330A (en) * 1996-03-29 2000-03-07 British Telecommunications World wide web navigational mapping system and method
US5870559A (en) * 1996-10-15 1999-02-09 Mercury Interactive Software system and associated methods for facilitating the analysis and management of web sites
US5958008A (en) * 1996-10-15 1999-09-28 Mercury Interactive Corporation Software system and associated methods for scanning and mapping dynamically-generated web documents
US6144962A (en) * 1996-10-15 2000-11-07 Mercury Interactive Corporation Visualization of web sites and hierarchical data structures
US6237006B1 (en) * 1996-10-15 2001-05-22 Mercury Interactive Corporation Methods for graphically representing web sites and hierarchical node structures

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2382174A (en) * 2001-11-20 2003-05-21 Hewlett Packard Co Data formatting in a platform independent manner
US8949461B2 (en) 2001-12-20 2015-02-03 Blackberry Limited Method and apparatus for providing content to media devices
WO2002076058A3 (en) * 2001-12-20 2003-09-18 Research In Motion Ltd Method and apparatus for providing content to media devices
US7213200B2 (en) * 2002-04-23 2007-05-01 International Business Machines Corporation Selectable methods for generating robust XPath expressions
WO2003105027A1 (en) * 2002-06-07 2003-12-18 Net Clue Corporation Improved web browser
US8825801B2 (en) 2002-06-18 2014-09-02 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US11526911B2 (en) 2002-06-18 2022-12-13 Mobile Data Technologies Llc Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US9032039B2 (en) 2002-06-18 2015-05-12 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US8135801B2 (en) 2002-06-18 2012-03-13 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US9619578B2 (en) 2002-06-18 2017-04-11 Engagelogic Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US8793336B2 (en) 2002-06-18 2014-07-29 Wireless Ink Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US10839427B2 (en) 2002-06-18 2020-11-17 Engagelogic Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
US9922348B2 (en) 2002-06-18 2018-03-20 Engagelogic Corporation Method, apparatus and system for management of information content for enhanced accessibility over wireless communication networks
EP1376408A3 (en) * 2002-06-28 2005-10-12 Nippon Telegraph and Telephone Corporation Extraction of information from structured documents
US7730104B2 (en) 2002-06-28 2010-06-01 Nippon Telegraph And Telephone Corporation Extraction of information from structured documents
EP1686499A3 (en) * 2002-06-28 2007-12-12 Nippon Telegraph and Telephone Corporation Selection and extraction of information from structured documents
EP1686499A2 (en) 2002-06-28 2006-08-02 Nippon Telegraph and Telephone Corporation Selection and extraction of information from structured documents
EP1542140A3 (en) * 2003-12-13 2006-04-26 Samsung Electronics Co., Ltd. Method and apparatus for managing data written in markup language
US7844644B2 (en) 2003-12-13 2010-11-30 Samsung Electronics Co., Ltd. Method and apparatus for managing data written in markup language and computer-readable recording medium for recording a program
EP1542140A2 (en) * 2003-12-13 2005-06-15 Samsung Electronics Co., Ltd. Method and apparatus for managing data written in markup language
EP1681644A1 (en) * 2005-01-14 2006-07-19 FatLens, Inc. Method and system to identify records that relate to a predefined context in a data set
EP1681643A1 (en) * 2005-01-14 2006-07-19 FatLens, Inc. Method and system for information extraction
US7770106B2 (en) 2006-03-17 2010-08-03 Microsoft Corporation Dynamic generation of compliant style sheets from non-compliant style sheets
WO2008035044A3 (en) * 2006-09-18 2008-08-28 Yann Emmanuel Motte Methods and apparatus for selection of information and web page generation
WO2008035044A2 (en) * 2006-09-18 2008-03-27 Yann Emmanuel Motte Methods and apparatus for selection of information and web page generation
US8166054B2 (en) 2008-05-29 2012-04-24 International Business Machines Corporation System and method for adaptively locating dynamic web page elements
US9524506B2 (en) 2011-10-21 2016-12-20 Bigmachines, Inc. Methods and apparatus for maintaining business rules in a configuration system
CN112035722A (en) * 2020-08-04 2020-12-04 北京启明星辰信息安全技术有限公司 Method and device for extracting dynamic webpage information and computer readable storage medium
CN112035722B (en) * 2020-08-04 2023-10-13 北京启明星辰信息安全技术有限公司 Method, device and computer readable storage medium for extracting dynamic webpage information
US20230028620A1 (en) * 2021-07-21 2023-01-26 Yext, Inc. Streaming static web page generation
US11816177B2 (en) * 2021-07-21 2023-11-14 Yext, Inc. Streaming static web page generation

Also Published As

Publication number Publication date
JP2002024227A (en) 2002-01-25
AU2001264810A1 (en) 2001-12-03

Similar Documents

Publication Publication Date Title
US6021416A (en) Dynamic source code capture for a selected region of a display
US6189019B1 (en) Computer system and computer-implemented process for presenting document connectivity
WO2001090873A1 (en) System and method for generating a wireless web page
US7194683B2 (en) Representing and managing dynamic data content for web documents
US7055094B2 (en) Virtual tags and the process of virtual tagging utilizing user feedback in transformation rules
US6658624B1 (en) Method and system for processing documents controlled by active documents with embedded instructions
US20080235567A1 (en) Intelligent form filler
US8046681B2 (en) Techniques for inducing high quality structural templates for electronic documents
US9348872B2 (en) Method and system for assessing relevant properties of work contexts for use by information services
US6735586B2 (en) System and method for dynamic content retrieval
US8402427B2 (en) Web application generator
US6882995B2 (en) Automatic query and transformative process
US7877677B2 (en) Methods and apparatus for enabling use of web content on various types of devices
US7730395B2 (en) Virtual tags and the process of virtual tagging
US20020143821A1 (en) Site mining stylesheet generator
EP0949571A2 (en) Document re-authoring systems and methods for providing device-independent access to the world wide web
US20070294646A1 (en) System and Method for Delivering Mobile RSS Content
CN1408093A (en) Electronic shopping agent which is capable of operating with vendor sites having disparate formats
GB2381340A (en) Document generation in a distributed information network
US20020116419A1 (en) Method for converting two-dimensional data into a canonical representation
US8645352B2 (en) Focused search using network addresses
US20020052895A1 (en) Generalizer system and method
US7895337B2 (en) Systems and methods of generating a content aware interface
JPH11167584A (en) Page shift method and its execution device and medium recording page shift processing program and data
JP2003281149A (en) Method of setting access right and system of structured document management

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 20020878

Country of ref document: UZ

Kind code of ref document: A

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 69 (EPO FORM 1205) OF 02.05.03

122 Ep: pct application non-entry in european phase