CA2382969A1 - Method for extracting digests reformatting and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation - Google Patents

Method for extracting digests reformatting and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation Download PDF

Info

Publication number
CA2382969A1
CA2382969A1 CA002382969A CA2382969A CA2382969A1 CA 2382969 A1 CA2382969 A1 CA 2382969A1 CA 002382969 A CA002382969 A CA 002382969A CA 2382969 A CA2382969 A CA 2382969A CA 2382969 A1 CA2382969 A1 CA 2382969A1
Authority
CA
Canada
Prior art keywords
document
source
script
online
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002382969A
Other languages
French (fr)
Inventor
Vadim Maslov
Zakhar Sapozhnin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Divine Tech Ventures
Original Assignee
Databites, Inc.
Vadim Maslov
Zakhar Sapozhnin
Divine Technology Ventures
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US14991199P priority Critical
Priority to US60/149,911 priority
Priority to US09/548,718 priority patent/US6538673B1/en
Priority to US09/548,718 priority
Priority to US09/634,481 priority
Priority to US63448100A priority
Application filed by Databites, Inc., Vadim Maslov, Zakhar Sapozhnin, Divine Technology Ventures filed Critical Databites, Inc.
Priority to PCT/US2000/023140 priority patent/WO2001014951A2/en
Publication of CA2382969A1 publication Critical patent/CA2382969A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

A method for extracting digests, reformatting, and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation is provided for structured online documents such as HTML, XML, SGML document or any document that has internal structure that can be represented by a tree. A digest of an online document is a collection of fragments (30) of this document which are of interest to a user. The system is based on a technique whereby a user selects a fragment of an online document shown in a source window (10) and copies this fragment to a target window (70) that contains the formatting digest. The system generates a sequence of web site navigation commands, online navigation tree commands and fragments commands that cause the assemble of the reformatted digest in the target window (20). The user can later ask the system to replay the generated commands, thus causing the automatic creation of the reformatted digest of the changed version of the online document. The digest documents can be displayed by user agents running on wireless and portable computer devices that have bandwidth and computational power limitations.

Description

METHOD FOR EXTRACTING DIGESTS, REFORMATTING
AND AUTOMATIC MONITORING OF STRUCTURED
ONLINE DOCUMENTS BASED ON VISUAL PROGRAMMING
OF DOCUMENT TREE NAVIGATION AND TRANSFORMATION
FIELD OF THE INVENTION
The present invention relates to a method for extracting digests, reformatting, and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation. More particularly, the invention relates to a system and method whereby a user selects a fragment of an online document shown in a source window and copies this fragrne~nt to the target window, the system creates a sequence of commands that can reproduce this behavior when applied to the new versions of the source documents downloaded from the information source, such as web site.
BACKGROUND OF THE INVENTION
I 5 Structured online documents, especially 1-ITML and XMI. documents available on the World Wide Web (WWW) have become very, important in the past few years. Such documents contain data which may be periodically updated, wherein such updating does not substantially change the format of presentation of such data.
These online documents usually are dynamically generated by the web servers and they present data stored in online databases. This data periodically changes, but since these documents are automatically generated by computers, the presentation W'd S~0#~'LbTT~O#b0~9 Ol 860~~~,~ZT~ J11 QJ1011 'BQAOS 118 'dd bZ :9~ Z00~ ~~
Hid .J ~:l 'rW wu y;v .
document structure remains substantially the same for relatively long periods of time.
Additionally, even when the web page is updated manually, the presentation document structure tray remain substantially the same for relatively long periods of time.
Examples of such frequently updated online documents include: stock quotes from brokerage web sites; prices of specific items from online commercial vendor sites and from online auction sites; local weather information from weather web sites;
airline ticket information provided by airline or travel sites; shipment tracking information from the mail delivery companies; current news headlines from the news ~ 0 organizations web sites; latest press releases of a specific company issued on their web site; bank account balances for an individual or corporation from the bank web site.
While alI this data may be of great interest to the user, it is often accornpanied by data that is unimportant or even irrelevant to a particular user. 'Ibis irrelevant data 15 unnecessarily complicates comprehension and interpretation of the relevazit data and often leads to the user missing important changes in the relevant data.
Examples of the data that may be unimportant to the user are:
Stock quotes for a stock of interest to the user are often accompanied by other data such as number of shares outstanding, opening and closing prices, 20 earnings in the last quarter and so on. While the user may need to check this data ~!r't,'~~!'.~_ f=
~ ..Y,.~i ~ Sr ~i~ur~. . ~ ..
~~'d S~0#~TbZZtO#b0~9 Ol 860~~L~2Z~ 011 QJ,01'l 'BQJIDH 113H ad S~:9T ~00Z ~Z
Hid once every 2 or 3 months, the user is not likely to want to see this data every time a current stock quote is sought.
2. Fluctuating price for an item in an onliae store that interests user may be accompanied with advertising for other items that the user has no interest in or it may be accompanied with product photographs which user has already seen many times.

3. Balances of the user's bank accounts may appear in separate online _J
documents (web pages) and be accompanied by the last i0 transactions. The user, however wants to monitor only balances of all his or her accounts m the bank so that every balance appears in a small window unaCCOmpanied by any other information.
In addition to this, if the user wants to monitor important data, he or she will ft nd it necessary to push the browser "Reload" button to obtain the latest data from the remote database. This requires considerable manual effort and can be fatiguing even when monitoring one online document. The manual effort required for t 5 monitoring several online documents simultaneously is so great that it makes such monitoring very difficult, if not impossible to do on a regular basis.
Summary. Online documents generated by online databases provide valuable data that a user may want to monitor. However, this essezxtiai information is often accompanied by large quantities of non-essential and even irrelevant information, or ZO information that rarely changes and does not need to be monitored_ .~'. " . ; . . . : :;:°.;
~~'d S~OkIWbZItO##b0~9 Ol 860~ZL~ZT~ O'1'1 QJ1011 'BQJlOH '113H 21~ SZ :9T
~00~ ZZ

~1 ~.a v ...a .~ . _.__ .
Therefore, a method is needed that allows a user to automate monitoring of essential data extracted from online documents while i$noning non-essential or irrelevant data.
In the remainder of this Section we present the state of the art in the technical S area of this irwention and show how this invention differs ~rorn the state of the art.
HTML. browsers and DOM
- , HTML, and XML structured online documents are displayed using web browsers such as Navigator by Nctscape~ Communications (http://www.netscape.com) and Internet Explorer by Microsoft~ corporation (http://www.microsoft.COmn.
A web browser is used in the preferred embodiment of the present invention.
However, none of the browsers known to us can display a document fragment in a separate window with no window treatments so that irrelevant information is not seen by the user and this window takes small space on user's screen. Also none of the browsers known to us implement automatic refresh.
The present invention augments the browser behavior and it uses the ability of the more advanced browsers to be controlled by other applications. Also the present invention uses the Document Object Model (DOM) to navigate the content of an online document represented as a tree of nodes.

. . . .. - y ~ ~ ~f b~'d S~0#WbL'GT0#b0~9 Ol 860Z~L~~I~ 011 QJlO'1'1 'BQ~lOH 11~H b~ ST:9~ ~00~ ~~
HST

Web site server-side customizations Most major web allow limited server-side customization of their content these days. Examples are MyYahoo~ (http://www.yahoo.cornn, MyNetscape~
(http://www.netscape.cotnn, etc. These customizations are nothing more than accounts created for users on these web sites_ Users see the customized content when they login into their accounts on the web site.
Web site customizations provide a limited choice of what can be customized.
For example. the user usually can select a portfolio of stocks to be displayed, but he or she usually cannot select what parameters are presented for a particular stock. Also I 0 usually such customizations are limited to very few online data categories. For instance, user can monitor all U.S. stock using such custottiization, but lte or she caru~ot monitor, say, Brazilian stock even though online stock quotes for Brazilian stock may be available online.
Furthermore, creating user-customized web site content requires complicated I 5 and therefore expensive programming from the web site maintainers, so this option is not practical for smaller web sites because of its price and complexity.
Finally, server-customized web pages are still shown in a regular web brvwser window that has a lot of unnecessary window treatments and user is still required to push the "Reload" button every time he wants to update.

~.; , ~, ' ,... , S~'d S~O~tWbZ'tZ0##b0~9 Ol 860Z~L~~W O'11 QJ1011 'BQhOH 113H 21~ SI:9Z ~00~ Z~
H~~

Using the present invention, the user can arbitrarily customize and monitor any web page content and select any presentation format for the customized content, and no programming is required both on web server side and on the user side.
Online data providers Several online services exist that can push certain online data such as stock quotes to the user's wired or wireless device such as pager or computer.
These services compare to the present invention in the same way as server-side web site customizations, because they have the same problems: limited choice of content that can be monitored, no way to arbitrarily customize presentation of such 1 t) content and what parameters are included, expensive server-side progrannming is required.
XML and J~C~LT
Several techniques exist that transform a highest Level abstract document presentation to the lower level document presentation used for rendering the document. Most notable effort in this area is XSLT Language (httn://www.w3c.o~) that is used to write programs that transform XML documents f~ttp:/jwww.w3c.org;~
to HTML documents that are rendered in a web browser.
These techniques do not cover the present invention because they are used to synthesize lower level document presentation from the higher Level document presentation but they do not change the content of the document. 'fhe present ~~:
~3~'d S~0#WbZTtO#b0~9 Ol 860ZZL~~~~ 011 QJ101'l 'BQ.IOH 1'18H Zld 9Z :9T ~00~
~~ H8d a~;.~1 ~ ~:..~'JJ
invention is primarily used to change the content of the document without changing the level of abstraction used in the document presentation.
Related Patents U. S. Patent No. 5,530,852 to Meske, Jr., teaches how to build web sites that store news articles and serve them to users through the Internet, providing categorization and search services. A typical news article is a structured document that has a title, summary (profile), and body. However, the patent No.
5,530,852 teaches processing news articles in the web server space, and not in the client space.
Also the patent No. 5,530,852 teaches programming of reformatting by a highly skilled computer programuner, while the present invention teaches creation of reformatting script by non-programmer user.
U.S. Patent No. 5.737.592 to Nguyen et al. teaches how to build server-side programs that receive queries from a web brvwser, automatically convert them to SQL queries run these queries on a database, convert records returned by the database r'.°,~.
l 5 to HTML and send this HTML back to the requester. The present invention is different from this patent because it applies on the client side and not on the server side and we are not concerned with generation of SQL queries.
U.5_ Patent Nos. 5,745,754 to Lagarde et al, and 5,752,246 to Rogers et al.
teach how to build server-side programs that use Distributed Integration Solution servers to perform extraction of data requested by a user from databases, and presentation of this data in HTML. These teachings would be of use to a highly-skilled programmer who programs web applications in extracting and reformatting ~~~tr..vr~ y,~~
~~'d S~0#~Zb'LZTO#b0~9 01 860ZZL~ZT~ J1'1 QJ,011 'BQJIOg 1'135 Z!~ 9T:9T Z00~
Z~ a8~

!, .J ~Jli'~l3 ~,~t1 i1 data in a database. Hut they are different from the present invention, because we teach how non-programmer user can create reformatting scripts on the client side.
U. S. Patent No, 5,774, 123 to Matron teaches how to record a sequence of navigation commands performed by a user on tile web browser and how to later replay these commands causing the browser to repeat the navigation session.
The record-and-replay feature of this patent does not teach extracting digests of online documents, nor does this patent teach extracting document digests using document trees and displaying the digests in a separate window.
U_S. Patent No. 5,799,304 to Miller teaches how a user agent can filter, i.e.
wholly display or wholly reject, a news article based on criteria provided by the user.
That is, it teaches how to make search engines more intelligent by using agent technologies_ This patent does not relate to extraction of docunnent digests.
U.S. Patent No. 5,890,152 to Rapaport teaches how to build a web search engine that takes into account user characteristics such as IQ, etc., all stored in a I S personal profile database. This patent does not relate to the present invention, because we are not concerned with user characteristics at all.
U.S. Patent Nos. 5,895,476 and 5,903,902 to Orr et al. are concerned with server side generation of online documents from the specialized higher level representations of documents. This is different from the present invention because the present invention applies on the client side and it does not change the transformed document's level of abstraction.

.. . n...y Y
0~'d S~0##WbTTT0t;b0~9 Ol 860~~L~ZT~ 01'1 QJ1011 '8QJlOH 1'13H 21d 91::91:
~00~ ~~ H3d . :... :. _ ;a ~~ 1r1 iiiYtfc ~.uu i Accordingly, it is a problem in the art to automatically monitor user-selected fragments of the online documents and to create scripts that perform such monitoring when such scripts are to be created visually by a user without requiring user to write a program of any kind.
SUMMARY OF TIl<E INVENTION
From the foregoing, it is seen that it is a problem in the art to provide a device meeting the above requirements. According to the present invention, a device is provided which meets the aforementioned requirements and needs in the prior art.
Specifically, the device according to the present invention provides a method t 0 for extracting digests of structured online documents, and automatic monitoring of the said digests. A digest of an online document is a collection of fragments of this document which are of interest to a user. Creation of the scripts that perform the said digest extraction and monitoring employs visual programming of the online document tree navigation and transformation. The disclosed method can be applied to 15 structured online documents such as HTML, XML, SC3~ML documents, or to any other online document that has internal structure that can be represented by a tree.
More specifically, the Western according to the present invention is based on a visual programming whereby a user selects a fragment of an online document shown in the source window and copies this fragment to the target window that contains the 20 reformatted digest. The system according to the present invention generates a sequence of web site navigation commands, online doeutnent tree navigation commands, and copy commands that cause the assembly of the reformatted digest in .. . ..,n ..LL~...~..5 .1' . r n 6~'d S~0#~TbTTtO#b0~9 O1 860~ZZ~~Z~ O'1'l QJlOTI '8QJ108 113H bd 9Z:9~ ~00~ ~~
HSd the target window. The user can later ask the system to replay the sequence o~
generated commands. thus causing automatic creation of the reformatted digest of the changed version of the online document.
Therefore, according to the present invention, when content of the original document changes and the script that creates the digest is run, the change is automatically propagated to the digest document. This 2illvws implementation of simple automatic monitoring of digests of the online documents which occurs entirely in the user space, that is in the application that controls the user's browser.
J
The digest document is typically much smaller than the original document, l0 and usually it does not contain computationally intensive and bandwidth intensive multimedia elements such as graphics, sounds, scripts and controls. This considerably lowers the screen size, bandwidth and processing power requirements for user agents that display document digests. Therefore, documents digests can be displayed by user agents that run on wireless and portable computing devices. Such devices have small screen, and their bandwidth and computational power resources are limited.
The preferred embodiment of the present invention is a computer program that is called WehTrarrsformer'~'M, It runs on Microsoft Windows~ 32-bit operating systems and as of filing date it controls the MicrosoR Internet Explorer.
Vocabulary Source Document and Source Window. The source window typically contains a regular browser such as Microsoft Internet Explorer. In this window the source ~1r~~
O~'d S~0#tWbWT0ttb0~9 Ol 860~~L~~I~ 011 Q.101'l '8QJlOH -1135 b~ LZ:9Z ~00Z ~~

online document is shown. Used to navigate to the web page of interest and to select a fragment of this page to be monitored.
Target Document and Target Window. The target window is where the digest of the source document is displayed. The digest of the source document that user monitors is also called the target document. The target window is typically much smaller than the source window and it does not have window treatments such as menu bars and scroll bars, so that it is possible to have many such window an one screen.
Command - Elementary instruction to perform operation on a document tree that can be recorded.
Script - A recorded or otherwise created sequence of commands.
How It Works The user typically performs the following actions in order #o use the present invention.
First, the user browses documents in the source window and when seeing a document of interest selects a fragment of the document that constitutes a digest.
Selection is performed by clicking the desired element of the web page. This click is translated by the browser into the address of the node in the document tree that represents the minimal HTML element that covers the clicked area.

W' d S~O~WbT T Z0ttb0~9 O1 860Z~L~ZT~ 011 QJ1011 'BQJlOg 1138 21d LZ :9~ ~00~
~~ Hid The user can then use the arrow keys of a computer keyboard to extend, contract, ar move sideways the selection. Other selection mouse clicks and keyboard keys may be used depending an the web browser.
When the user finishes selecting the fragment, the user invokes the user intErface "Copy" command that copies contents of the selected fragment from the source window to the target window.
In addition to that, according to the present invention the WebTransformer creates a script that records the source document location, sequence of docutnent tree navigation commands that leads from the tree toot to the node that corresponds to the selected fragment, and the "Copy To Target" command.
The system can record all elements of user navigation including entering User ID and Password or filling out and submitting other online forms that cause the desired navigation.
.;
Finally, according to the present invention the user can ask the WebTransformer to run the script that has been created. The user can request a one-time execution of the script or automatic periodic execution of the script according to a user-specified time table. Script execution results in fresh (not from cache) download of the source document, navigating the source document tree to the selected tree node and copying the selected source document fragment to the target window.

,.:
~~'d S~0#t~tbtTZ0ttb0~9 Ol 860~~L~Zt~ X11 QJlO'11 '8QAOH 1'13H 2J~ G'G :9T
~00~ Z~

Summary of Beeefts ~fhe present invention brings the following benefts to its user:
1. User views and monitors only the fragments of online documents that are of interest to him or her, not the whole documents.
2. User does not have to push the "Reload" button, it is done for him or her automatically by the ~VebTransformer.
3. Combination of typically small size of target windows and auto-refresh feature allows to monitor many (10-SO) online documents simultaneously without applying any manual effort.
4, Since the document digest is small and it typically does not contain large pictures or embedded programs (such as JavaScript, Java, ActiveX
programs), the document digests download and execute much faster than the original documents.
5. Since document digests are small in size, and since they require less bandwidth and less computational power to display than the original documents, the I 5 document digests can be successfully displayed on small-screen user agents that have bandwidth and computational power limitations, specifically on user agents that run on wireless devices such as cellular phones, pagers, wireless personal digital assistants (PDA), and sa on. These devices' primary limitation is screen size, so they would greatly benefit from the present invention.

~,~..,.i~..f. j~siei~w . . -....i:~.~ ;~i ~:
~~'d S~0#WbWTO#bH~9 Ol 860~ZL~ZI~ 0'11 QJ101'1 'BQJ,OH 118H 2J8 LZ :9Z ~OHZ ~~
Hid .. ~:~..j ._. , > H a.r smsv w..vrv ~
Other objects and advantages of the present invention will be more readily apparent from the following detailed description when read in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. I schematically shows two source documents, each shown in a source window, and a target document shown itt a target window.
Fig_ 2 shows a concrete example of source document from the financial web site contained in source window and the doetunent digest of this document shown in a target window.
t 0 Fig. 3 shows a concrete example of source document obtained frorra a shipping company and digest of this document monitored in a target window. It also shows several other target windows that monitor other source web pages with their source windows hidden.
Fig. 4 shows a partial source document tree for the source document shown in Fig. 2.
Fig. 5 shows a WebTransformer script that extracts doeurnent digest from the source window and shows it in the target window in Fig. 2.
Fig. b shows a block diagram for client-server WebTransformer setup.
fiig_ 7 shows a block diagram of communicating devices for use in a wireless device application according to the present invention.

.. . N1A=~~yiq rV
.. _... :a iy~~
b~'d S~0#WbWTO#b0~9 Ol 860Z~L~ZW O'1'1 QJ1011 'BQJlOH 113H ?~~ 8z :9T ~00~ ~~

xa w'v l.iu !r V 111Htt L(j~ j DETAILED DESCRIPTION OF THE ><NYENTION
Windows In the preferred embodiment a user typically observes two windows per instance of the WebTransformcr script:
1 Source Document Windoyv. This window contains the source online document that is displayed using a regular web browser such as Microsoft Internet Explorer. This window is used to navigate to the online document that will be .; monitored and to select a fragment of the online document to be monitored.
2 _ Tdrget Document Window. This window is where the digest of the source document appears. This window is usually smaller than the source window and it typically has no window treatments such as menu bars, control box, or scroll bars.
When a WebTransformer script is recorded, both source and target window are displayed. When the recorded script is replayed, user has an option of displaying I 5 both source and target window or only the target window. Typically user does not display the source window at the script replay time.
If target docuraent is assembled from several source documents, then several source windows may be displayed. However, each WebTransformer script typically has only one target window associated with it.
S~'d 5~0~~TbTT~0t3b0~9 Ol 860~~L~~T~ J11 QJ1011 'BQJ,Og 1"13H b~ 8Z :9Z ~00~
Z~

_' . ~_ _ _ C, l t' .
The goal of this design is to keep target windows as small as possible so that several such windows monitoring different documents can be placed on the screen without overlapping each other. Target windows also can be placed on the system tray of the Microsoft Windows~l operating systems.
Figure l schematically shows two source documents in source windows and one target document in the target window. Source document 1 is displayed in the source window I 0. Source document 2 is displayed in the source window 20.
Target document is displayed in the target window 30.
Figure 2 contains the actual screen shot of the working WebTransformer. It shows the source window 10 on the left that contains the source online I-iTML
document from the web site at "http:/www/quicken.com/' that contains a detailed stuck quote for CyberCash~ lnc. Note that the "Last Trade" digits "12 3/4"
(30) are highlighted to show that these digits constitute the document fragment selected by the user.
./
The small window 20 on the right is the target window that shows the target online document that contains the same digits "12 3/4" (40) that constitute the target document fragment that was copied from the source document fragment 30. The target window title contains the name of the WebTransformer script that created the target document and the time when the script was run last tune.
2Q Figure 3 shows the web gage (online document) 10, in this case depicring a FedEx~ Corp. web page that is used to track air shipments. A user selected web page i~~~
9~'d S~0#WbWTO#b0~9 Ol B60Z~L~~W X11 Q.1011 '8QJl0g 1-13H b~ 8I:9~ Z00~ Z~

fragment 30 that contains the latest event that happened to the user's shipment. This fragment is copied to the target window 20 where it is shown as the document fragment 40.
Also shown in Figure 3 are unrelated WebTransformer target windows 50, 60, and 70 that track other web sites. SpeciFtcally, window 50 tracks stock quote taken from a financial services web site, window 60 tracks a particuiar lot price from the online auction, and window 70 tracks weather in New Jersey from a weather web site.
The source windows that correspond to these target windows are hidden on instruction from user_ ! 0 Source Document Tree send DOM
We use tree representation of the source online document in creating the transformation script according. to the present invention. Ia the document tree each logicai unit of the document such as paragraph, table, heading, emphasis is represented by a node. Node A is a child on node B if and only if the document I 5 fragment represented by node A is directly embedded into document fragment represented by node B.
The most popular implementation of the online document tree model for HTML and XML online documents is Document Object Model (DOM) (see http://www.w3c.o~Q/ for details). Document Object Model is implemented in modern ?0 bruwsers such as Microsoft Internet Explorer ver. 5 or Netscape Navigator ver. 5.
The preferred embodiment of this invention uses DOM as a source document tree 'c~:.'-:cT
~,~'d S~0##~TbWTO#tb0~9 Ol 860ZZZ~ZT~ J't'1 QJlOTI '8QJlOH 113H 2i~ 8Z :9Z
Z00Z Z~ H3~

model. Other embodiments of this invention can use different tree models for representing the source document.
Figure 4 shows partial document tree for the source document 10 from Figure 2 (complete tree is too big to show it on one page). The root of the tree contains S BODY element 10 that represents body of the document. The B (for bold) node represents HTML element 8 that cornains the user-selected document fragment 30 on Figure 2. The path consisting from free nodes 31, 32, 33, 34, 35, 36, '37, 38, 39, 40, and 41 leads from the root of the tree to the tree node 20.
Creating the Script I 0 A script that performs online document transformation according to this invention (also called WebTransformer Script, or WTS) is created in the following manner.
A source document is displayed in the first window 10 of Figure 1. The first window 10 is herein referred to as a source window 10. Transformed (target) 15 document is displayed in a second window 30. The second window 30 is herein referred to as a target window 30.
A user can select a source document fragment by clicking the desired fragment using computer pointing device such as a mouse_ Selected source document fragment is highlighted. Then. using keys of a computer keyboard, user can expand or contract 20 the selected fragment. In Figure I , a fragment 15 is shown as being selected.

.... y~..~_ ~~
0~'d S~0#3~TbZTTO#b0~9 Ol 860Z~L~~I~ J11 QJ,011 'BQJ,OH 1'l3H b~ 6Z:9Z ~00~ ~~
H~~

(7nce the fragment 15 is selected, the user can copy the fragment 15 to the target window 30 by selecting "Copy" user interface command from the graphical menu of commands, and a copied fragment then appears in the target window 30 as a target fragment 31. The user can then proceed, for example, to another online document 20, select a fragment 25 therein and copy it to another target location 32 in the target window 30.
The script that downloads the source document and transforms its fragment into the fragment in the target document is created according to the following rules:
1. Add to the script the "Go To URL" conunand that causes the browser in the source window to navigate to the source document. The location of the source document includes UIZ.L, address. The location information can also include additional data that needs to be passed to the web server to cause displaying of the page selected by user, such as post data and headers.
The command 10 from the sample WebTransformer script shown at Figure 5 causes browser to navigate to address "http://www.quicken.com/investments!quotes!?symbol~ych". This sample script transforms the source document 10 at Figure 2 to the target document 40.
2. Add to the script a sequence of "Go To Child" commands that take us from the downloaded document tree root to the document tree node that represents document fragment selected by user for monitoring_ .: n,.~,..~u~sitY:~r';:
.~ ....1'~' 5~'d S~0#~TbIITO#b0~9 Ol 860ZZL~ZT~ ~'l1 QJ,OTI '8QJlOH 11~g b~ 6T :9Z ZOaZ ZZ
H~~

~s r Creation of the command sequence starts with finding a tree node that corresponds to the document fragment selected by the user. WebTransformer asks DOM implementation to compute the minimal HTML element that covers the selection made by the user in document. Single mouse click is~treated as a selection of zero width.
Then we use parent links to walk up from the selected node to the roof node.
While walking, we record the indices of nodes in their parents, so that the recorded path can be walked again from the root, when the docurraent is reloaded.
For instance, the commands 20, 21, 22, 23, ..., 30, 3i on Figure 5 walk the tree node path from the root node 10 on Figure 4 to the user-selected node 20 and on the way they pass tree nodes 31, 32, 33, ..., 39, 40, and 41.
3 _ Add to the script the "Copy To Target" command. Creating the script in the case of multiple source pages requires "Copy To Target" command to be qualified by the target ID at the target document_ t 5 For instance, in Figure 5 "Copy To Target" command 40 finishes the script by copying the user-selected source document fragment to the target document.
..
. i ~ 5.:~.-. : .
ab'd S~Okt~TbTIZO#1b0~9 Ol 860Z~L~~Z~ 011 QJ,O-1'1 'BQJ,OH -1'~3H zl~ 6T :9Z
~00~ ~~ 8~~

- . . . ..~. a. ~ y a The formal algorithm for the script creation is as follows:
Input: tree node selected element that is a part of the source document tree.
Output: the script object that is a list of commands.
0. Create empty script object.
1. Add "Copy To Target" command to the script object.
2. Set variable a that refers tv the current tree node to selected element.
3. Do until a is not NULL
3a. If e.lag is equal to "BODY" or a has no parent then Exit this Loop 1 U 3b. Create "Go To Child" command object.
3c. Node p = e.parenl 3e Compute integer ix which is equal to index of node a in the node p.
Index of the first child is 0, index of the second child is 1, and so on.
I 5 3f. Store ix in the cmnraand.
3 g. Add command before the first command at the script.
3 x. EndDo ' 4. Add "Go To URL" command that navigates browser to the user-selected source page before the first command at the script.
?0 Tb'd S~0#~'LbWTO#b0~9 Dl 860ZZL~ZT~ D11 QJ,O'1'1 '8Q.lOH 1'188 2J~ 61::9t ~00~
~~ H~~

r' ~ l , ..
Recorded script can be saved in a computer fele and later loaded from that file.
Running the Script The user can instruct WebTransfotmer according to the present invention to run the created script or alternatively to run a script loaded from fire. The WebTransformer according to the present invention then executes the sequence of commands contained in the script, thus causing the source document(s) to'be downloaded from the Internet, and fragments) of these documents to be selected and copied to the target window. All this happens automatically according to the recorded script.
f 0 The user can either run the script once or instruct the WebTransformer according to the present invention to run the script automatically according to a time table set by the user (for instance, every S minutes). The script can be run on the same desktop computer where it was created or the script can be transferred to another computer (for example, by downloading, uploading or e-mailing it) and run on w-J I 5 another computer. The other computer may be another computer belonging to the user or can be a server computer which can run this script on a request from a client.
Why the Tree?
Every time we reload the source documient, there is no guarantee that it will be the same as the previously loaded document or that it will even be close to the 2U previously loaded document. Many things can change even in the relatively stable documents generated from online databases. (1) Advertising banners that appear on most web pages change every time the page is loaded, and they may have complicated ~t~~ ,~~~:ai:
.;.
~b'd S~0#~Zb~T'G0#b0~9 Ol 860Z~L~ZZ~ ~'1't QJ,011 'BQAOH 113H 21~ 0~:9T ~00~
ZZ H

internal structure that is different for every ad that is displayed, (2) Certain non-advertising items may substantially change too. For example, on Figure 2 there is a list of "Recent Headlines". Number of elements an this list and composition of this list may substantially change every few hours as new headlines for the company S appear and old headlines are removed. Also the list of available site features ("Chart", "Intraday Chart", "News", "Evaluator" arid so on) changes approximately once every month as the site implements new features and removes old features.
' So to be able to find the user-selected &agrnent of the changed source online :;r document we need to rely on a document model such that an algorithm of getting to I 0 the user-selected fragment will be the least affected by changes in the other parts of the document. The Document Tree is the document model that was selected for use in the present invention, because it provides good degree of independence of the transformation script from the document charges.
Tree nodes and their children that are not on the path from the root to the user-'w--~ I 5 selected node may change and their change will not affect the path to the user-selected element, so the script that locates this element will still work. For example;
on Figure 4 nodes 51 and 52 are likely to contain the changing content, because they are related to advertising banners that are often put into IFRAMEs. But these nodes are not on the path from the root node 10 to the user-selected nude 20, so even if the entire 2U content of these nodes changes, the transformation script built according to the present invention stilt will be able to find the user-selected element 20 in the new document tree.

_..r-.
.~ t "~."'v'~"'°'~.
1 w7~lV_.r5, .
~b'd S~0#WbWO#b0~9 Ol 860ZZL~ZI~ x'11 QJ,01'l 'BQJIOH 118H bd 0Z:9Z Z00Z ZZ
Hid v sm... 1J ~lI ieinl ~ ~ V V
However, if nodes 51 or 52 on Figure 2 are removed entirely, then the WebTransformer script will not be able to get to the user-selected node 20.
Therefore repeated running of these transformation scripts in order to obtain an updated digest of the updated source online document substantially relies on the assumption that the path from the root node to the user-selected fragment node will not change in the new document.
This typically is the case for the frequently updated online documents, because these documents are automatically generated from the same template by a web server program which uses the same template for dynamic online document generation.
i 0 Client-Server Web?ransformcr In the present invention, as described above, displaying of the document digest occurs in the same process and on the same computer that runs the WebTransformer script and performs t>xe transformation. Under certain circumstances it becomes necessary to separate the document digest displaying function from the document -, 5 digest creation function, so that these functions may be executed on different computers. Then the program that displays the document digest is called Web Transformer client and the program that performs the online document transformation according to the present invention is called WebTratrsfvrmer server.
See Figure 6 for schematic drawing of the client-server setup. The 20 Web'T'ransformer client 10 sends a request to get the fresh document digest to the WebTransformer server 20, which in turn sends request to download the source online document to the web site 30. When the source online document 50 is returned from _ -__ bb'd S~0#~TbTtTO#b0~9 O1 860~~L~~t~ J1'1 QJlO'1'I 'BQ.lOH 113H bd 0~:9Z Z00~
~~ Hid .. ~ . ~ V V l the web site 30 to the WebTransformer server 20, the server performs the source document transformation and document digest creation according to the script prepared by the user and uploaded to the server and the resulting document digest 40 is sent hack to the requesting client.
The client-server WebTransfotmer can be used in the following situations:
1. WebTransformer client is located on a small-screen handheld or wireless device. Wireless provider or individuals themselves setup a WebTransformer server and put their WebTransformer script on it. The wireless device client connects to this server to get the document digests. Tltis setup is described in more detail below.
2. A company sets up a firewaIl hat does not give any access to the outside Internet to company employees but uses Internet web sites to feed only the approved information to the employees. The company sets up WebTransformer server 20 and puts on it a number of WebTransfotmer scripts that extract and reformat JJ
l 5 ehe approved data from the Internet. The access to the outside Internet is closed to employees, but they can use their WebTransfotmer clients 30 to view the approved document digests from the WebTransfotztter server 20.
3. A company sets up WebTransformer server that monitors a pariicnlar web page or assortment of web pages that are of interest to the company. The documents digests extracted by WebTransformer scripts are read by robotic client that 4t Sb'd S~0#WbZZZO#b0~9 O1 860~ZL~ZZ~ X11 QJ1011 'BQAOH T13H b~ 0~:9Z ~00~ ~Z H3~

converts them to text and stores them into database. This is a good way to arrange important data extraction through the web site.
Handheld and Wireless Devices The document digest produced by a WebTransformer script is usually smaller thin the original document and it usually does not contain computationally intensive and bandwidth intensive multimedia elements such as graphics, sounds, scripts, and applets_ This lowers screen size, bandwidth and processing power requirements for user agents that receive and display such document digests.
Since handheld and wireless devices such as screen cell phones, pagers and I 0 personal digital assistants (PDAs) all have small screen and most of them also have limitations in available bandwidth and processing power, it is more appropriate to use such devices for online document monitoring using the present invention than to use such devices for web browsing. A complete web browser for such devices, even if developed. is not be very practical, because most web pages are designed for large I 5 desktop screens and not for small screens used in handheld and wireless devices.
Therefore viewing web page designed for the big screen will not be conveztient on the small screen of a handheld device, and developing a small-screen version of very web page out there is impractical.
The present irwention provides a way of monitoring small fragrrtents of larger 20 web pages on a handheld or wireless device with a small screen. A preferred scheme of using the present invention to monitor the fragments of the web pages on small-9b'd S~0#3~'Gb'GZT0ttb0~9 O1 8602ZL~ZW 01'1 QJIO-1-1 '8QAOH 1'l3H b~ T~:9T
~00Z ~~ H

' v r r o i?l ~ l~f~i f.' l :! -1 screen device with limitation in available bandwidth and computational power is presented at Figure 7.
In this scheme, a user creates scripts according to the present invention on this or her desktop computer 60 on Figure 7. The created scripts are uploaded to the central server computer 20 of the wireless provider over the user desktop to wireless provider connection 70 which typically is a dialup connection.
The handheld device 10 can communicate with the central wireless computer ;~J
20 over a relatively slew wireless or similar link 40. The handheld device can download a list of available WebTransformer scripts that the user uploaded to the central computer. On instruction froth the user, the handheld device 10 can ask the central computer 20 to run the transformation script and to send the digest document produced by the script to the handheld device where they are shown as the document digests 1 ! and 12.
This way communications that require potentially high bandwidth, such as I 5 downloading the source online document from the web site 30 to the central computer will occur over the fast communication link 50 that typically exists between server computers, all operations related to the source page downloading and transformation that potentially require higher computing power will occur ozt the central computer 20, and the handheld device 10 will only need to download a small digest document 20 over the slow link 40 and it will show tile smaller digest document 11 or 12 on its small screen.

.. . . . '.: =;' ::
~b'd S~0#~ZbZI'G0#b0~9 Ol 860Z~L~~t~ X11 QJ1011 '8QJ,OH '1'13H bd W:9Z ~00Z ~~
Hid . , -. ~ ~~J Ui3?'Ye v a. tw.. s .,. . . ~.., Also, the user can ask a central server Computer 20 to send to the user a target document o~zly when it changes. This way, even less bytes have to be sent bctween the central computer and the wireless device.

~~b'd S~0#~Ib'GTZO#b0~9 O1 860ZZL~ZI~ J11 QJ101'~ 'BQJ,OH 1138 2i~ Z~:9T ~00~
Z~ H

Claims (20)

1. A method for extracting digests from structured online documents and monitoring the said digests, comprising the steps of:
recording the script that consists of commands that include loading the online document in the source window, navigating the tree of the source online document, and copying fragment of the online document to the target window;
saving the script in a computer-readable medium; and replaying the script using a computer to automatically generate an updated target document from an updated source document.
2. A method as claimed in claim 1, wherein the structured online documents from which information is to be extracted include any document that has hierarchical internal structure that can be represented by a tree.
3. A method as claimed in claim 1, wherein method employs a visual programming technique.
4. A method as claimed in claim 3, wherein the visual programming technique provides for at least two windows being logically present for each script: a first window as a source window and a second window as a target window.
5. A method as claimed in claim 4, wherein at time of script recording user can select a fragment of a source online document shown in a source window by clicking the said fragment and to request creation of a script that finds the selected fragment in the current and subsequent versions of the source document.
6. A method as claimed in claim 5, wherein at the script creation time a sequence of commands that comprise the script that extracts the selected source document fragment is generated.
7. A method as claimed in claim 6, wherein the generated sequence of commands includes document tree navigation commands that lead from the root node of the source document tree to the node of the source document tree that represents the fragment selected by user.
8. A method as claimed in claim 6, wherein the generated sequence of commands further includes "Copy Fragment" command that causes transfer of contents of the selected source document fragment from the source window to the target window.
9. A method as claimed in claim 8, wherein the visual programming technique allows for replaying of the memorized commands at a subsequent time to automatically create a digest of a new version of the specified online document.
10. A method as claimed in claim 9, wherein the digest is typically smaller than the source online document from which it is made, and the digest is a fragment of a source document that is typically made by the user to omit unnecessary and irrelevant graphics and text elements often present in online document.
11. A method as claimed in claim 1, wherein the script can be automatically replayed at predetermined time intervals.
12. A method as claimed in claim 1, further comprising during the step of recording of commands to form a script, identifying a portion of at least one further structured document to be copied to the target document and identifying a placeholder in the target document to which the said fragment is to be copied.
13. A method as claimed in claim 1, wherein the copied document fragment is represented by a node in a tree that represents a structured online document.
14. A method as claimed in claim 1, further comprising during the step of recording of commands to form a script, recording navigation commands that navigate the structured document browser to the source structured document.
15. A method for extracting digests from structured online documents, and automatic monitoring of the said digests based on visual programming of document tree navigation and transformation, whereby structured online document is any document that can be stored in a computer and that has a hierarchical structure that can be represented by a tree, comprising the steps of:
recording of commands to form a script that identifies a fragment of a structured document to be copied from source document to target document;
saving the said script in a computer-readable medium; and replaying the script using a computer to automatically generate an updated target document from an updated source document.
16. A method as claimed in claim 15, wherein a technique is provided whereby for each script at least two windows are logically present: a first window as a source window and a second window as a target window, and wherein the technique allows a user to select a fragment of an online document shown in a source window and to create a script that copies the selected fragment to the target window.
17. A method as claimed in claim 16, wherein the technique generates a sequence of the source document tree navigation commands that lead from the root node of the source document tree to the node of the source document tree that represents the document fragment selected by user.
18. A method as claimed in claim 17, wherein the technique further includes "Copy Fragment" commands that cause the assembly of a document digest in the target window.
19. A method as claimed in claim 18, wherein the technique enables replaying of the memorized commands at a subsequent time to create a digest of a new version of the specified online document.
20. A method as claimed in claim 19, further comprising during the step of recording of commands to form a script, identifying a portion of at least one further structured source document to be copied to the target document.
CA002382969A 1999-08-23 2000-08-23 Method for extracting digests reformatting and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation Abandoned CA2382969A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US14991199P true 1999-08-23 1999-08-23
US60/149,911 1999-08-23
US09/548,718 US6538673B1 (en) 1999-08-23 2000-04-13 Method for extracting digests, reformatting, and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation
US09/548,718 2000-04-13
US63448100A true 2000-08-08 2000-08-08
US09/634,481 2000-08-08
PCT/US2000/023140 WO2001014951A2 (en) 1999-08-23 2000-08-23 Method for processing and monitoring online documents

Publications (1)

Publication Number Publication Date
CA2382969A1 true CA2382969A1 (en) 2001-03-01

Family

ID=27386888

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002382969A Abandoned CA2382969A1 (en) 1999-08-23 2000-08-23 Method for extracting digests reformatting and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation

Country Status (4)

Country Link
EP (1) EP1210655A4 (en)
AU (1) AU779907B2 (en)
CA (1) CA2382969A1 (en)
WO (1) WO2001014951A2 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1381965B1 (en) 2001-03-23 2018-05-09 BlackBerry Limited Systems and methods for content delivery over a wireless communication medium to a portable computing device
US20030035002A1 (en) * 2001-08-15 2003-02-20 Samsung Electronics Co., Ltd. Alternate interpretation of markup language documents
FR2843640B1 (en) * 2002-08-16 2010-03-19 Systeam Process for generating, transmitting and processing documents
US7493603B2 (en) 2002-10-15 2009-02-17 International Business Machines Corporation Annotated automaton encoding of XML schema for high performance schema validation
US7437374B2 (en) 2004-02-10 2008-10-14 International Business Machines Corporation Efficient XML schema validation of XML fragments using annotated automaton encoding
US20080320169A1 (en) * 2005-01-12 2008-12-25 Ian Shaw Burnett Systems, Methods, and Computer Programs for Enabling a Computing Apparatus to Obtain Data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0529121A1 (en) * 1991-08-24 1993-03-03 International Business Machines Corporation Graphics display tool
CA2204736A1 (en) * 1994-11-08 1996-05-23 Charles H. Ferguson An online service development tool with fee setting capabilities
US5774123A (en) * 1995-12-15 1998-06-30 Ncr Corporation Apparatus and method for enhancing navigation of an on-line multiple-resource information service
US6029182A (en) * 1996-10-04 2000-02-22 Canon Information Systems, Inc. System for generating a custom formatted hypertext document by using a personal profile to retrieve hierarchical documents
US5956709A (en) * 1997-07-28 1999-09-21 Xue; Yansheng Dynamic data assembling on internet client side

Also Published As

Publication number Publication date
AU779907B2 (en) 2005-02-17
WO2001014951A3 (en) 2002-01-24
AU2039001A (en) 2001-03-19
EP1210655A2 (en) 2002-06-05
EP1210655A4 (en) 2006-06-14
WO2001014951A2 (en) 2001-03-01

Similar Documents

Publication Publication Date Title
US8819003B2 (en) Query refinement based on user selections
US8949705B2 (en) Facilitating data manipulation in a browser-based user interface of an enterprise business application
US8918713B2 (en) Module specification for a module to be incorporated into a container document
US9495429B2 (en) Automatic synthesis and presentation of OLAP cubes from semantically enriched data sources
US20140095539A1 (en) System and method for asynchronous client server session communication
US8335837B2 (en) Transferring data between applications
US8452925B2 (en) System, method and computer program product for automatically updating content in a cache
US6675230B1 (en) Method, system, and program for embedding a user interface object in another user interface object
US5884309A (en) Order entry system for internet
KR101013046B1 (en) Methods, systems, and computer program products for client side prefetching and caching of portlets
US6806890B2 (en) Generating a graphical user interface from a command syntax for managing multiple computer systems as one computer system
US7321918B2 (en) Server-side control objects for processing client-side user interface elements
US7930364B2 (en) Persistence of inter-application communication patterns and behavior under user control
Kamba et al. The Krakatoa Chronicle-an interactive, personalized newspaper on the Web
US6635089B1 (en) Method for producing composite XML document object model trees using dynamic data retrievals
US8812380B2 (en) Tax-return preparation systems and methods
US7058626B1 (en) Method and system for providing native language query service
US6434563B1 (en) WWW browser configured to provide a windowed content manifestation environment
US8001478B2 (en) Systems and methods for context personalized web browsing based on a browser companion agent and associated services
US6249291B1 (en) Method and apparatus for managing internet transactions
US7933914B2 (en) Automatic task creation and execution using browser helper objects
US6405216B1 (en) Internet-based application program interface (API) documentation interface
US6996798B2 (en) Automatically deriving an application specification from a web-based application
US7346857B2 (en) Dynamic controls for use in computing applications
US7571391B2 (en) Selective rendering of user interface of computer program

Legal Events

Date Code Title Description
EEER Examination request
FZDE Dead

Effective date: 20150825