WO2004046945A1 - A customized life portal on the internet - Google Patents
A customized life portal on the internet Download PDFInfo
- Publication number
- WO2004046945A1 WO2004046945A1 PCT/US2002/040319 US0240319W WO2004046945A1 WO 2004046945 A1 WO2004046945 A1 WO 2004046945A1 US 0240319 W US0240319 W US 0240319W WO 2004046945 A1 WO2004046945 A1 WO 2004046945A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- life
- user
- portal
- life portal
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
- H04L67/306—User profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Definitions
- the present invention relates generally to Internet application software and web site configuration. More specifically, it relates to methods and systems for implementing a portal-type web site on a client computer highly customized for a particular user.
- the content may not be retrievable from the portal or ISP hosting the user's personal web site.
- the range of content available may be limited to the content created or hosted by the portal or made available to the portal (e.g., licensed by the portal or ISP), or may otherwise be from a limited range of sources.
- the portals and ISPs providing the personalized portal service are content aggregators.
- the amount of content that can be aggregated is necessarily limited because most of the content on the Internet is not available for syndication and, therefore, cannot be collected by third- parties, such as portals.
- the sources available to the portal are limited to sources licensed for use by the portal and may not have the content the user wants, thereby restricting the level of customization of the personal web pages.
- Meta-browsing technology generally fails to address conflicts and errors that arise when manipulating various types of content and how web sites implement or handle content, such as HTML and javascript. This limits the portals ability to provide content relating to various aspects of a user's life.
- present meta- browsing technology fails to allow users to see the entire range of content from a web page. For example, present meta-browsers only allow users to see content limited to a single table and does not enable the user to see complete portions of a web page. Present technology also often fails to maintain and consistently display tables in views via present meta-browsers.
- meta-browsing technology is not efficient at locating content that a user will likely want to follow in order to stay current on the user's interests.
- meta-browsing technology is often difficult and cumbersome to use, making it inaccessible to the majority of non-technical users.
- What is needed is a truly customized, personal web site that can be created and maintained in an efficient and intuitive manner. It would be desirable to allow a user to create a truly personal web site or portal that, at a high level, reflects the user's life and who that person is; that is, web pages that present the user with content, such as views into user-selected web sites and topical magazines, that are of direct interest to the user.
- the life portal should accept content from sites not previously visited or not known to a service provider and modify the content so that there is a high probability that the content will be displayed without problems and, generally, function in a manner consistent with user expectations. This should also be done transparent to the user, in so far that the user should not be required to intervene in the process of modifying and displaying the content.
- a user life portal on the Internet built from the bottom up is described.
- the life portal has at least one life page, at least one view contained within the life page wherein the view is content relating to a user's interest.
- a life page contains at least one magazine wherein the magazine contains text and links directing a user to articles relating to a user's interest.
- a user can create a pixel view which is any portion of content chosen by a user from a web page scraped from a web site.
- a user can create a parsed view which is content consisting of a table from a web page scraped from a web site.
- a user can create a magazine containing text, such as headlines or titles of articles and links to those articles, on a specific topic selected by the user. Magazine content is derived from the life portal service provider determining what themes are present in an article, clustering the themes so that they are logically indexed, and storing the themes and links to the articles in the service provider database.
- These content are contained in portlets which, generally, are contained in life pages.
- Content can also be stored in a container referred to as a persistence panel in a life portal.
- Content is fetched and scraped from virtually any web site on the Internet and is not restricted to web sites having a previous relationship with the service provider, such as a licensing agreement or contract, nor is it restricted to content created, produced, or commissioned by the service provider.
- the life portal service provider offers a user pre-created views and magazines on topics in which the service provider believes its users may have an interest or which the service provider believes contains high-quality content.
- a method of implementing a life portal on the Internet is described.
- a life portal is created on a client computer whereby a life portal applet is embedded in a browser on the computer when code necessary for retrieving and parsing HTML for the life portal is initially installed or created on the client.
- the creation and use of the life portal does not require that any application be downloaded or installed on the client.
- requests for content from web sites are made from the client computer, using the client's IP address, cookies, and so on, rather than from the life portal service provider servers.
- the applet performs several functions, such as parsing the content, performed by a parsing engine, and determining the appropriate rules and applying those rules to the content, performed by a rules engine.
- the life portal uses controls, such as Active X or controls having similar functionality, to retrieve content from a site and have the content displayed in a life page. In this embodiment, no applications or applets are installed on the client or client browser.
- the life portal service provider retrieves content for a life portal user. The content is parsed on the service provider servers and the rules are applied to the content before it is transmitted to the client computers for display in the life portals as views.
- the life portal service provider server performs most of the processing of the content, stores cookies and other user security data, and uses its own IP address when retrieving content. The server also caches content from sites that are accessed often.
- a user is able to access her life portal from any computer equipped with a browser and capable of accessing the life portal service provider site.
- An applet or portal engine determines whether a domain from which content is being retrieved is listed in a domain table. If the domain is listed in a domain table, a corresponding rule set is identified. This is done by examining a domain/rules mapping table which associates domains with rules. The rules are applied to content from the domain.
- the content is HTML and the rules modifies the HTML and other code, such as javascript, such that displaying a portion of the HTML in a view, separated from it original context, does not cause problems or breakdowns.
- a default set of rules is applied to the web page. If the default rules do not address all problems arising from displaying the view, new rules are added to a rule set maintained by a life portal service provider. The rule set is expected to grow as new domains are added to the domain table. The application of the rules to content in this manner enables the display of HTML in views with minimum issues and breakdowns when a user sees the content in a view in a life page.
- FIG. 1 is a hierarchical diagram showing a structure of a life portal in accordance with one embodiment of the present invention.
- FIG. 2 is diagram showing relationships among a life portal service provider, a life portal user, and third-party web sites providing content for the life portal.
- FIG. 3 is an overview flow diagram of a process of creating a custom life portal from a standard life portal in accordance with one embodiment of the present invention.
- FIG. 4 is a screen display of a life portal and life page showing a magazine and view in accordance with one embodiment of the present invention.
- FIG. 5 A and 5B are screen displays of a life portal showing a menu of actions a user can perform on views, magazines, and life page in accordance with one embodiment of the present invention.
- FIG. 6 is a diagram showing a server-side implementation of the life portal in accordance with one embodiment of the present invention.
- FIG. 7 is a diagram showing a client-side implementation of the life portal in accordance with one embodiment of the present invention.
- FIG. 8 is a diagram showing a logical representation of sample data sets or tables that may be used to apply rules to content before the content is displayed in a life page in accordance with one embodiment of the present invention.
- a life portal contains one or more storage containers referred to as life pages.
- a life page is a content storage area which, in turn, holds information in the form of magazines and views, both of which are content specifically compiled for a user. Magazines and views are stored in portlets. Thus, a life page may have multiple portlets for storing content.
- the life portal of the present invention reflects the life of a user; it displays content of specific, user-defined interest to selected aspects of the user's life.
- a life portal reflects the wide ranging interests of a user limited only by the content accessible on the Internet and other public and private networks, such as Intranets, virtual private networks, and so on.
- the Internet and browsers are used for illustration, however, other networks, data sources, and user interfaces can be applied to the concepts and implementations described herein for the present invention.
- a user is initially presented with a standard life portal.
- a standard life portal is customized by a user to display content, as views and magazines, most of which is retrieved from the Internet.
- FIG. 1 A A hierarchy of components comprising a life portal is shown in FIG. 1 A.
- a life portal 102 At the root of the hierarchy is a life portal 102 on the Internet viewable through a browser.
- life pages 104 Below the life portal are one or more life pages 104.
- persistence panel 106 There may also be a persistence panel 106, a special type of container storing content that the user views often or would like to see at all times while in the life portal and, therefore, is not ideally suited for storing in a life page.
- a persistence panel contains views and/or magazines that are always displayed on a life portal. Below each life page are portlets 108. Contained in a portlet is content 110, at the bottom of the hierarchy, specifically, views and magazines. The views and magazines can be either pre-created or uniquely created by a user.
- the life portal of the present invention has a user interface designed to enable a user to navigate through the portal and create and retrieve content in an efficient and intuitive manner.
- a life page is represented by a tab icon, resembling a folder tab.
- other graphical icons or designs, such as buttons or menu bars can be used.
- a life portal engine and overall administration and operation of life portals are under control of a life portal service provider.
- content from the Internet is fetched, that is, HTML is retrieved, typically using an HTTP command, from a wide variety of web sites, theoretically any web site on the Internet accessible with a browser and scraped, as described below.
- the life portal service provider servers store text utilized for indexing magazine content. Techniques for scraping desired or selected content from a larger body of fetched content, e.g., entire HTML pages, from web sites are known to persons of ordinary skill in the field of Internet application programming.
- the service provider is not a conventional content aggregator, limited or restricted to fetching content from only selected content providers or content syndicates having a relationship with the service provider; for example, content in a life portal is not restricted to so-called "walled gardens.”
- a user modifies a life portal primarily by creating, deleting, and modifying life pages, views, and magazines.
- a user can change the criteria used by life portal application software to retrieve content from the service provider's databases, thereby changing the views and magazines in the life pages and persistence panel.
- a life portal service provider 202 maintains software and hardware components 204 that power the creation and upkeep of numerous life portals, such as a life portal 206.
- one of the software components 204 is a database containing content, such as news articles and other types of text-based content, scraped from web sites and tl emed by the life portal service provider using scraping techniques and techniques for determining themes from content, such as methods using key word counts or concepts, as are known in their respective fields.
- the themed content is used to create magazines.
- the range of web sites, such as sites 208a, 208b, and 208c, from which content is scraped is unlimited insofar that the service provider is permitted to access the site and retrieve content,
- Content is retrieved from the third-party web sites and examined by life portal service provider 204 to reveal themes found in the content. This is done for compiling magazines.
- a magazine is comprised of one or more articles having a theme directly relating to the subject matter of the magazine.
- content primarily articles, are themed and clustered by the life portal service provider or by another appropriate entity capable of performing this function and dispersed to one or more service providers.
- Content appropriate for a magazine can then be identified and compiled into magazine form and displayed in a life page or persistence panel.
- the magazine can be described as having a newsletter-type format.
- a magazine is comprised of one or more content descriptors, where a content descriptor can include the following items: headline or title of article, an abstract of the article, source of the article, date, and, if available, an appropriate photo or graphic.
- a content descriptor in a magazine may have fewer or a greater number of informational items.
- a content descriptor may only have a title and article abstract.
- a content descriptor may have a ranking for the article indicating the degree of relevancy of the article to the magazine topic.
- the service provider does not place any self-imposed restrictions on which sites it can access to fetch content.
- the service provider is not limited to content hosted, licensed, or created by content providers or content syndicators or other hosting or sponsoring entity.
- the service provider will select which web sites are accessed from any that are available on the Internet.
- the user can request that the service provider access specific sites to fetch content in which the user has a specific interest.
- the service provider will consider the request and make a decision as to whether to access the sites.
- the service provider may place reasonable restrictions on which sites it will access, such as refusing to access to pornographic sites or sites that contain content not legally obtained by the sites, such as copyright protected content requiring a license..
- FIG. 3 is an overview flow diagram of a process of creating a personal life portal in accordance with one embodiment of the present invention.
- a user goes to the life portal service provider web site on the Internet.
- the user's Internet service provider provides a link to the life portal service provider registration page.
- the life portal may be a tool or feature offered by an ISP to its subscribers and is powered by the life portal service provider.
- the user creates a password and completes other adininistrative steps as required by the ISP or life portal service provider.
- the user begins the process of creating a customized life portal.
- One of the primary goals of the present invention is to allow the user to create a portal that closely reflects various aspects of the user's life.
- the user is presented with a blank life portal screen.
- the user responds to queries which are examined by the life portal service provider so it may provide the user with pre-created life pages.
- the content in the pre- created life pages include sites that the service provider believes can provide high quality content or content that will likely be of interest to many of its users.
- the present invention enables a user to build a life portal dynamically from the bottom-up; that is, the user builds a unique and customized life portal to match her interests and specific needs. This is done by retrieving content from the service provider database that has been themed and clustered for inclusion in magazines and by retrieving content from an unlimited range of web sites on the Internet for views.
- the user creates a truly unique portal that is closely tailored for her and reflects the various aspects of her life.
- a life portal has only content that is of interest to the user, effectively, only content selected by the user.
- a life portal may contain some content not selected by a user, such as a life page containing content selected by a life portal service provider or a sponsoring ISP. However, the amount of such content is insignificant compared to the content selected by the user and would normally be limited to one life page.
- Content is scraped from a wide range of web sites by a portal engine and, with respect to magazines, themed and clustered based on the subject matter of the content.
- the portal engine can fetch content from any web site accessible through a browser or any other type of user interface capable of accessing content on the Internet or public or private network.
- the portal engine scrapes web sites and places the scraped content in document roots or buckets.
- various types of data formats or data in other types of markup languages from data sources besides the Internet can be retrieved.
- the content scraped is from pages at the sites that have content on them that are updated regularly.
- a theme may exist in a database without content associated with the theme. Whenever content is pulled from the database, the content themes are pulled as well. This is done using algorithms known in the field of computer programming. After content is themed and before the content and the theme identifiers are stored in the database, the content is clustered with existing content based on the content's themes. Newly scraped content may have more than one theme in which case a link to the content resides in more than one location in the clustering hierarchy. New content is clustered with existing content using algorithms known in the field of computer programming. By clustering content themes, the portal engine can retrieve all content relevant to a particular topic. This process is used in compiling articles for magazines.
- Clustering is a process of organizing multiple pages from one or more sites in a hierarchy based on themes in the pages. As such, pages or, more generally, content, are clustered as opposed to themes. Thus, a cluster of pages, a page being one form of content, shares at least one common theme.
- the portal engine will determine themes in each story.
- a user creates a magazine by inputting search criteria defining the subject matter of the magazine or, in other embodiments, the sources, such as sites, from where the service provider may retrieve content relevant to the subject matter of the magazine.
- the life portal service provider can also create magazines for its users. A pre-created magazine is created in the same way as regular magazines except the service provider first identifies each magazine source or web site.
- a pre-created magazine on professional basketball may have as sources NBA.com, the NBA page of the ESPN.com web site, and the NBA page of the FoxNews.com web site.
- These three sources are content sources that the service provider can use to create a magazine which it makes available to life portal users.
- the service provider fetches content for magazines from sites it decided to use for magazine content. It is expected that this pool of sources will expand with time and resources.
- a user can request that the life portal service provider or sponsor go to specific web sites to obtain content for a magazine on a subject of interest to a user and add those sites to the list of sites to be themed by the service provider.
- the service provider can make the final decision as to whether it will fetch, scrape, and theme content from the one or more sites requested by the user. It is expected that the service provider will accommodate user requests for adding new sites as sources for magazine content as is practical and feasible.
- a user has the ability to create views of multiple lesser known sites which provide content that may not be available at many of the major portals and web sites, such as Excite, MSN, or Yahoo.
- a user can also create magazines containing content on any topic of interest to the user. Magazines contain links to textual content and associated pictures, such as news stories relevant to the topic chosen by the user, and that the service provider has themed.
- a user can also select content from pre- created views and magazines created by the life portal service provider. These pre-created views and magazines contain content that may be of interest to a wide range of users or may be high-quality content that the service provider believes would appeal to its users.
- the user launches a search of the content already scraped, themed, clustered, and indexed by the service provider.
- the user is not restricted to a so-called "walled garden," a limited collection of web sites, when retrieving content.
- the user may also request that specific web sites be scraped for content.
- the user begins creating life pages.
- the user is presented with an empty life page that can be described as a canvass on which a user will configure and arrange content, namely, views and magazines.
- the initial life pages are created by selecting categories from a list of pre-defined categories supplied by the service provider or by responding to queries posed by the service provider to efficiently determine the user's interests.
- the user can assign essentially any name to the life pages.
- the names are displayed on tabs or other graphical icons or designs. In the described embodiment, the names are always displayed on a life portal regardless of which life page is displayed.
- the user provides criteria for populating a life page with content.
- the user can populate life pages with content as desired without significant constraints imposed by the life portal service provider.
- the content can fall under any topic selected by the user, and may be a specialized or obscure topic. This approach to populating life pages with content reinforces the concept of building of a life portal from the bottom up to uniquely match the interests and priorities of each user.
- one type of content is a magazine.
- the user selects a life page and creates a magazine on a particular topic presumably falling under the subject matter of the life page.
- the portal engine compiles the magazine for the user by searching for articles on the topic from the themed content on the life portal service provider content databases.
- a user can suggest or request that content at those sites be scraped so it is available for inclusion in a magazine.
- the service provider decides which sites will be examined for content to ensure that proscribed content is not accessed from the service provider's databases.
- the most relevant segments of the content are located at various web sites and aggregated to create a magazine. In any case, headlines of news stories and other types of text articles with hyperlinks from the various sites are combined to create the magazine.
- the magazine is highly tailored and unique to the user.
- a view is content from a single web site and allows a user to see a portion of a third-party web site without leaving the user' s life portal.
- a portion of the third-party web site is a component in the user's life portal and viewable using a meta-browser.
- Views and magazines are stored in portlets.
- portlets allow users to see views via a meta- browser, a browser nested within the user's browser used to see content from another web site.
- other types of content or tools such as video or javascript, can be contained in a portlet.
- the process of initially populating a life page with content is complete.
- the process is then repeated for other life pages at step 310.
- the user can also show or hide a persistence panel in the life portal regardless of which life page is displayed.
- One of the goals of the present invention is to create a life portal using views and magazines stored in life pages that closely reflect the unique personality, interests, preferences, and so on of a particular user.
- the life pages, views, and magazines of individual life portals can vary widely.
- the life portal service provider may also allow the user to modify, to some degree, the look and feel of the life portal.
- One aspect of a life portal is that it allows a user to see numerous views and magazines from different life pages simultaneously.
- a life page can be described as a folder for views and magazines which, from the user's perspective, share a common subject or topic.
- a life page is given a title by the user, which may be any name desired by the user, a feature that further emphasizes the concept of the life portal reflecting the user's personality, life, and interests.
- a life page is essentially a container or folder with a user-selected name and, therefore, has no significance or use if not populated with content.
- content is either a view or a magazine.
- Views and magazines are created on topics selected by the user.
- the user can create a view that is content from the "Hollywood Reporter” web site, another view that is content from the "Variety” web site, and so on.
- the user can also create a magazine that contains headlines and links to articles on movies by a particular studio.
- the articles and text-based content will come from various web sites, thus, a magazine is the appropriate medium for this content.
- the user can assign any name to a life page, as well as to views and magazines.
- a life page can also be pre-created by the service provider and contain pre-created views or magazines.
- Pre-created life pages, views and magazines are components that the life portal service provider believes may be of interest to many of the life portal users or that the service provider would like to bring to the attention of the users because the content is of particularly high quality.
- a pre-defined life page named by the life service provider as CURRENT EVENTS may have pre-defined views such as a segment of the CNN web site or a view showing the front page of the Wall Street Journal.
- a life page can have pre-created magazines.
- a user can decide to keep or delete a pre-defined view or magazine in a life page and add her own views and magazines.
- a user can also change the name of the life page from CURRENT EVENTS to another name.
- the user can perform certain functions or actions on views, magazines and life pages. For example, a user can add or delete a view, magazine, or life page. A user can also edit a view, magazine, and life page. Some of the editing functions for a life page include the following: clean-up, save, delete, refresh, and rename. Some of the editing functions for a view include: fix, move, delete, refresh, rename, and set auto refresh rate.
- a web page from which content originates undergoes a change in format or configuration, such as the insertion of a table
- the user can execute a fix view operation.
- a fix view operation When a fix view operation is selected, a new window appears and the user can adjust the view as needed.
- the user can select a different table or segment from the page or can instruct the engine to use the seventh table instead of the sixth table in a page, and so on.
- the life portal engine will adjust how and from where it will retrieve data from the web sites. For example, a table on a web page may have been moved, re-sized, or changed in some manner. Many popular sites reconfigure the layout of their pages often.
- Magazine 402 is list of headlines and links to corresponding articles stored at the life portal service provider servers and originally scraped from third-party web sites.
- the articles and content for a magazine are compiled from content scraped from web sites by the service provider.
- the various content are aggregated to form the text of the magazine articles.
- views are also unique to the user.
- a view, in contrast to a magazine, is from a single web site and shows content from a single page of a selected web site. However, the user dictates what will be in the view and what content of the page will comprise the view. In the described embodiment, there are two types of views: parsed views and pixel views.
- a parsed view is content from a single table taken from a web page from a web site.
- Many web sites organize their data in web pages with tables.
- the life portal engine parses a web page into its separate tables.
- a pixel view results from retrieving an entire web page from a web site and allowing the user to display any segment of the page and does not involve parsing the web page or identifying tables in a web page.
- a parsed view is created from parsing a web site into tables.
- web sites often use tables to delineate and format content on a web page. Many web sites use tables in this manner.
- a web page is parsed to separate the tables, each table containing a portion of content of the web site. The user selects which table will comprise the view.
- a user moves a cursor over the tables after the page has been parsed and clicks on the table she wants. As the cursor moves over the tables, delimiters around the tables change indicating that the user is in a new table.
- other HTML elements may be parsed, such as DIV or SPAN tags.
- a pixel view is the entire web page offset behind what is visible via the portlet, in other words, a pixel view masks portions of the web page the user does not want to see.
- the portal engine does not parse the web page.
- the entire page is loaded and configured such that the only content visible in the view is content that the user wants to see regardless of the table configuration on the web page.
- a pixel view is selected by a user by using a cursor to define an area on the web page that the user wants to be the view.
- the boxed area can be drawn anywhere on the page when defining a pixel view. Once the area has been defined, the content from the web page is placed in a portlet and becomes the view.
- the user can choose whether a view is parsed or pixel.
- the underlying structure of a view is determined by the life portal service provider. The fact that there are different types of views is visually transparent to the user. However, if content from a web site is displayed as a pixel view, an entire web page is transmitted to the life portal. Consequently, pixel views may cause unintentionally large volumes of data to be transmitted to the user's computer thereby consuming significant bandwidth and likely to cause processing slowdowns on the life portal.
- parsed views result from creating content, i.e., a table, selected by the user.
- Tables can be nested within other tables.
- the user selects tables by using a pointing device to highlight the desired tables after the service provider has parsed the HTML on a web page. For example, when a table is highlighted the background and text colors may be inverted, images may be shown in the negative, and a delimiter separating the parsed tables, such as a red line, dashes and blinks. The user then clicks on the selected table and the table becomes the view.
- a view can result from parsing a web page, a parsed view, into tables or from superimposing a portlet on an entire web page and displaying a portion of the page as a view, a pixel view, in which the other portions of the web are masked from view.
- a web page is comprised of HTML code which can come in different flavors and types. Problems and unexpected results occur when HTML content is scraped from an originating site, transmitted to another site where the content, such as a web page, is manipulated in some manner and displayed in a meta-browser.
- a web page may have, windows that pop up and display advertisements or may have mechanisms for displaying error messages to users which may be undesirable in a life page.
- a table selected from a scraped web page may reference code, such as javascript, not in the web page or in the code for the selected table. Therefore, it is often necessary to modify the HTML and other code contained in a web page so the page content can be displayed as a view in a life page or persistence panel.
- code such as javascript
- the user's computer or client
- the client is provided with an applet that executes certain functions, such as parsing and rule engines (described below), and powers the life portal and enables communication with the life portal server.
- the client uses Active X controls or similar controls in the client browser to fetch content or HTML from web sites.
- the client invokes the fetching of web pages from third-party sources on the Internet or from the life portal servers.
- the life portal server is responsible for retrieving content and transmitting the modified content to the client to be displayed as views.
- neither implementation requires that the user download or install any atypical application software onto his or her computer.
- the user can access her life portal from any online computer.
- the browser must be able to accept applets and cookies (typical default settings) or execute Active X or equivalent controls.
- applets and cookies typically default settings
- Active X only runs in certain operating system platforms and browsers.
- the client browser uses Active X controls to fetch content from web sites. Once the content is fetched, client side code is necessary to scrape content and rules are applied for displaying the content in a life page.
- the Active X controls developed by Microsoft, or equivalent controls or code developed by other third parties only fetch content in the form of web pages. Controls or code may be developed by third parties or by the life portal service provider that perform the same or similar functionality of present Active X controls. These potential future controls may be used to perform the same functions as present Active X controls.
- code is embedded in the web pages; that is, javascript is created on the client in the web page in the browser and is executed in the browser.
- FIG. 6 is a diagram showing a server-side implementation of the life portal in accordance with one embodiment of the present invention.
- a client computer 602 implements a life portal 604.
- Life portal 604 has a life page containing a parsed view 606.
- a parsed view is used only for illustrative purposes. The process described also applies to pixel views. Using a parsed view provides the opportunity to describe the role of a parsing engine in the overall process.
- life portal 604 When life portal 604 is invoked or opened by a user, a request for each parsed and pixel view (among other data) is transmitted from client computer 602 to a life portal server 608 over the Internet. One portion of the request to life portal server 608 is for retrieving content for view 606.
- life portal server 608 processes the request from client 602 and retrieves the content from its own database ("cached" content) or from a third-party web site 610.
- the content normally a web page for each view, is retrieved and processed by life portal server 608.
- the processing includes parsing for parsed views.
- the modified HTML is transmitted to client computer 602 and displayed as view 606 in life portal 604.
- client computer 602 performs minimal processing.
- the speed with which client 602 accesses the modified HTML and other content from life portal server 608 depends on the type of connection, e.g., dial-up, broadband, etc., between client 602 and the Internet.
- life portal server 608 has high-speed connectivity with the Internet, such as broadband or a T3 connection. This is expected for acceptable performance because server 608 may retrieve content concurrently, specifically entire web pages, for numerous life portal users.
- a request is an HTTP request from life portal 604 to server 608 and is associated with a view, such as view 606, and is, more specifically, from an inline frame, or iframe, representing view 606.
- the request contains all the parameters needed for server 608 to retrieve and process HTML content for view 606 and life portal 604.
- Life portal server 608 may have cached the content in its own database servers. If life portal server 608 goes to web sites to retrieve the HTML, it parses the code and extracts the table needed for parsed view 606. In the case ofapixel view, the web page is not parsed and the entire page is transmitted to life portal 604.
- the server-side solution has advantages, for example, when the client is an Internet appliance or a so-called "thin" client, there are aspects in its operation that may be drawbacks under certain circumstances. For example, some high-traffic sites are sensitive to having the same IP address accessing it too frequently. Too many hits may slow down or bring down a web server and is a legitimate concern for many popular web sites. When performance issues arise, the third-party web server may simply deny further access to the particular IP address. Another issue that arises in the server-side implementation is universal user authentication. Certain sites require user authentication to access content. Once a visitor is authenticated, typically by entering a username and password, the site stores a cookie on client computer 602.
- life portal server 608 This allows the user to leave the site and not have to log back in if the user returns to the site within a pre-defined time frame, such as an hour, referred to as the expiration time for the cookie.
- storing cookies for users on life portal server 608 is burdensome and raises security issues with respect to the users' login names and passwords.
- Another issue that may arise from the server-side implementation is a third-party site suspecting that the life portal site is scraping its content and that it is consequently attracting more users and becoming more well known.
- the third-party site may disfavor the life portal site because the life portal service provider is accessing the content, effectively redisplaying it, and is likely not displaying the advertisements that the original site relies on for revenue.
- a client computer 702 plays a larger role in retrieving and processing HTML for a life portal 704.
- Life portal 704 still makes a request for each view to a life portal server 708 similar to the request made in the server-side implementation.
- server 708 returns only data that client computer 702 needs to retrieve the content directly from the third-party sites.
- Life portal 704 uses these parameters relating to the view to retrieve the entire web page from a content source 710. In some cases, the web page may be cached internally at the browser on client computer 702.
- Client computer 702 processes a web page using an applet it obtained when the user initially created life portal 704.
- the Java applet is embedded in the web page that the browser on client 702 reads during the initial life portal creation process.
- an applet is a component of a standard HTML element.
- An applet can be executed within a page using standard HTML tags as defined by the W3CAs such, the applet is downloaded without significant intervention from the user.
- the user follows the routine step of "signing" the applet by clicking a button saying the user accepts it.
- links are used to launch the view creation process are components of a page (not necessarily limited to life pages) in a life portal, rather than being embedded in a page. These links are added to the "Favorites" toolbar of the user's browser.
- the links execute Java script.
- a user selects a button in her browser to perform a function in the life portal, such as creating a parsed or pixel view.
- Such a button or icon which may be contained under the "Favorites" menu in the browser has underlying client- side code.
- This underlying script and button may be referred to as a bookmarklet (an industry term for a bookmark that executes client-side script).
- a button of this type would normally be a link to another web page.
- a user at a particular web site may decide she wants to create a view in her life portal from that site. She opens her Favorites list in her browser and selects the 'Create a View' button or bookmarklet in the list. By doing so, the underlying client-side script for creating a view is invoked. By invoking the script, the client computer contacts the life portal server and transmits the URL of the site from which the user desires to create a view. Upon receiving this data, the life portal server transmits a uniquely configured page to the browser enabling the user to continue with the process of creating a view in a life page.
- bookmarklets for creating views
- the user simply follows the instructions for creating a life portal and in the process downloads the applet and other components needed for the portal. Downloading applets during an installation of any type of application or tool over the Internet is commonplace and generally transparent to the user.
- the user follows the same steps for creating a life portal except the applet is not embedded in the browser.
- the applet can still be imported in the server-side implementation for future use (e.g., if the user decides to switch to a client-side implementation) and be done transparently to the user.
- An applet on client 702 runs a parsing engine, a rules engine, and other functions on the HTML.
- the functions that run on client 702 in the client-side implementation generally also run on life portal server 608 in the server-side implementation.
- client 702 is responsible for scraping third-party sites for web pages associated with its views.
- client 702 for third-party web sites that require authentication and use cookies for re-entry to that site, a cookie from the site is stored on client 702 thereby allowing universal authentication of the user for that site.
- life portal server 708 does not have to store the cookie or any other secure data relating to the user.
- client 702 makes the request for HTML at third-party sites.
- high-traffic sites will not see the same IP address, i.e., the IP address of server 708, scraping content from them at all.
- An applet is needed on client 702 to process the HTML on the web page because browser security restrictions do not allow web developers to edit and manipulate content loaded in the user's browser from a different domain using client-side script. For example, a browser can use access HTML at a third-party site, but not modify it within the browser using javascript from a different site.
- the applet enables the browser on client 702 to retrieve, process, and parse, if necessary, the HTML so it may be displayed as a view.
- a web page request is made through a Java component in the applet using standard techniques known in the field of Internet application programming.
- the service provider determine the appropriate applet for client 702 based on the version of the Java Virtual Machine that resides on the client, e.g., the Microsoft JVM or the Sun JVM, and send the appropriate applet to the client which the applet needs in order to run Java classes.
- the user does not need to download any applications to upgrade or modify the applet so it is compatible with a particular JNM.
- the life portal applies a set of rules to the web content before it is displayed.
- rules are applied to a web page according to a domain/rule set mapping table.
- these rules are applied by the applet.
- the domain/rule set mapping table is derived from two sources: a list of known domains and a set of rules.
- the list of known domains contains the names of web sites from which the life portal service provider scrapes content or, more broadly, for which it wants to establish rules. Typically, these will be domains from which content is scraped regularly or frequently.
- the list will expand to include sites requested by life portal users and from which content has been retrieved (either by the client or the life portal servers), and for which a rule mapping has been derived.
- FIG. 8 is a diagram showing a logical representation of sample data sets or tables that may be used to apply rules to content before the content is displayed in a life page in accordance with one embodiment of the present invention.
- the tables shown in FIG. 8 are illustrative of the concepts and data constructs behind the application of rules to content. The actual implementation and programming of these logical constructs may take on various forms and can be done by a person of ordinary skill in the field of computer programming.
- a list of known domains is shown as Table 1. Each domain has a corresponding unique identifier.
- Table 2 is listing of rules or parameters that a rules engine embedded in an applet or portal engine applies to prevent unwanted behavior or breakdowns when editing content and displaying as a view. Examples of these rules are provided below.
- a unique identifier preferably having a different format from the unique identifier used to identify the domain names in Table 1.
- the identifiers may be alphanumeric or use only characters, as in the example shown in Table 2.
- Each rule addresses one issue or problem that may arise when displaying content as a view in a life page.
- a majority of the problems that typically arise stem from javascript code in the web page, but may arise from other types of code.
- the life portal service provider anticipates many of the problems that may occur and has derived a rule or set of rules to address each problem. It is expected that unanticipated problems will arise when dealing with new sites or with new types of content. When this occurs, the service provider derives a solution to the issue and incorporates it as a rule in Table 2.
- Tables 1 and 2 are not static listings but rather listings that are expected to grow as the number of life portal users increases and the types of content in a web page diversifies.
- the service provider uses Tables 1 and 2 and the life portal service provider's knowledge of which rules should be applied to each known domain, the service provider creates a mapping of domain names to rules.
- the service provider can also anticipate or detect problems that may occur beforehand and derive rules to address such problems. However, it is possible that applying one rule to a web page from a particular domain will work as expected but applying the same rule to a page from another domain will produce unwanted results. By applying a rule across pages from all listed domains, certain pages may be fixed but others may be damaged or not effected. Therefore, it is important that the service provider keep track of which rules to apply to each domain.
- Table 3 is an illustration of a domain name/rules mapping table that accomplishes this task.
- the life portal service provider determines which rule or rules, if any, need to be applied to each web site from which the service provider will be scraping content. For example, a breakdown may occur when a portion of HTML representing a view ("view HTML") is parsed from a web page containing javascript. The web page has HTML, however, only a portion of it, such as a table, is needed for the view. The portion needed may have HTML that is dependent on HTML or javascript that is not resident in the view HTML. As a result, when the view HTML executes on the life portal, the user sees an error message resulting from the invocation of code that does not exist in the view HTML.
- a rule or parameter to address this issue may be to modify the tags in the view HTML so that the error message does not appear and the user can continue operation uninterrupted.
- Another possible rule may be to include the dependent HTML or javascript with the view HTML.
- the rule or rules are selected by the life portal service provider, inserted in Table 2 and associated with one or more domains. This association is inserted in Table 3.
- a parsing engine scans the entire page and determines which of the existing rules need to be applied to the page and applies the relevant rules to the page. If the service provider determines that existing rules will not address problems arising from the view HTML, the service provider derives additional rules and adds them to Table 2.
- rules are invoked by a rules engine in the client applet that associates domains to rules as shown in Table 3.
- the rules engine determines the domain of the web page. For example, the engine detects that the web page is from CNN.com, checks the domain/rules table, and determines which rules should be applied to the CNN.com domain identifier. The rules are retrieved and applied to the page, thereby potentially modifying the web page in some manner. The parsing engine then parses the page. When the engine detects that a subsequent web page is from another domain, a different set of rules will be applied, although the rule(s) may happen to be the same as for the CNN.com domain.
- the domain/rules mapping table is consulted to see which rules will be applied.
- the life portal service provider adds new domains, it examines the web pages from the domain and determines which existing rules or whether any new rules are needed to address potential breakdowns or problems in displaying the view HTML from the new page. It is expected that the rules and the domain listing will grow with time and as the life portal gets more users. It is also possible that life portal users may request web pages from domains not listed in Table 1.
- the javascript is diverted to a method that does nothing.
- the life portal service provider adds a dummy or non-functioning method to the javascript. If there is an "open" method in the javascript, the new method is called; if there is no "open” method, the new method is not called.
- the parsing engine retrieves all of the HTML from a page and parses out any style sheet references, javascript references, and the specific HTML that the user wants to see (in many cases embedded in a table, but not necessarily) and returns these components to the portlet. Ifno rules re applied to the web page, there may be times when javascript in the view references HTML elements that previously existed in the page, but were removed during the parsing process. If the javascript were to execute, an error would occur. Because this error results only from having modified the HTML content of the original page, the user would not be expecting it. Thus, it is important that the life portal suppress this error message. This is done by applying a rule that suppresses all javascript errors.
- the Internet is used as the primary medium in which content and other data is transmitted and web sites as the primary content sources from which content is scraped and viewed on a life portal.
- the content sources and medium are not limited to web sites and the Internet.
- Other forms of electronic data distribution could be used to gather information; information could be gathered from a variety of electronic sources other than web sites; and can be processed and displayed on via user interface and viewing tools other than Internet browsers (e.g., displays on hand held devices, smart devices, and the like).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02797377A EP1570364A4 (en) | 2002-11-15 | 2002-12-17 | A customized life portal on the internet |
AU2002361740A AU2002361740A1 (en) | 2002-11-15 | 2002-12-17 | A customized life portal on the internet |
CA002505837A CA2505837A1 (en) | 2002-11-15 | 2002-12-17 | A customized life portal on the internet |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/298,183 | 2002-11-15 | ||
US10/298,182 | 2002-11-15 | ||
US10/298,181 US20040098467A1 (en) | 2002-11-15 | 2002-11-15 | Methods and systems for implementing a customized life portal |
US10/298,181 | 2002-11-15 | ||
US10/298,183 US20040098451A1 (en) | 2002-11-15 | 2002-11-15 | Method and system for modifying web content for display in a life portal |
US10/298,182 US20040098360A1 (en) | 2002-11-15 | 2002-11-15 | Customized life portal |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004046945A1 true WO2004046945A1 (en) | 2004-06-03 |
WO2004046945A8 WO2004046945A8 (en) | 2004-08-05 |
Family
ID=32329795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/040319 WO2004046945A1 (en) | 2002-11-15 | 2002-12-17 | A customized life portal on the internet |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1570364A4 (en) |
KR (1) | KR20050084999A (en) |
AU (1) | AU2002361740A1 (en) |
CA (1) | CA2505837A1 (en) |
WO (1) | WO2004046945A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009001137A1 (en) * | 2007-06-28 | 2008-12-31 | Taptu Ltd | Interactive web scraping of online content for search and display on mobile devices |
US9176956B2 (en) | 2008-04-07 | 2015-11-03 | Lg Electronics Inc. | Apparatus and method for providing search screen |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6185587B1 (en) * | 1997-06-19 | 2001-02-06 | International Business Machines Corporation | System and method for building a web site with automated help |
US6209007B1 (en) * | 1997-11-26 | 2001-03-27 | International Business Machines Corporation | Web internet screen customizing system |
US6286043B1 (en) * | 1998-08-26 | 2001-09-04 | International Business Machines Corp. | User profile management in the presence of dynamic pages using content templates |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1188134A2 (en) * | 1999-02-11 | 2002-03-20 | Ezlogin.com.Inc. | Personalized access to web sites |
EP1389318B1 (en) * | 1999-08-31 | 2005-12-21 | Lucent Technologies Inc. | Method and apparatus for web-site-independent information personalization from multiple sites having user-determined extraction functionality |
US20020152279A1 (en) * | 2001-04-12 | 2002-10-17 | Sollenberger Deborah A. | Personalized intranet portal |
-
2002
- 2002-12-17 KR KR1020057008758A patent/KR20050084999A/en not_active Application Discontinuation
- 2002-12-17 WO PCT/US2002/040319 patent/WO2004046945A1/en not_active Application Discontinuation
- 2002-12-17 EP EP02797377A patent/EP1570364A4/en not_active Withdrawn
- 2002-12-17 AU AU2002361740A patent/AU2002361740A1/en not_active Abandoned
- 2002-12-17 CA CA002505837A patent/CA2505837A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6185587B1 (en) * | 1997-06-19 | 2001-02-06 | International Business Machines Corporation | System and method for building a web site with automated help |
US6209007B1 (en) * | 1997-11-26 | 2001-03-27 | International Business Machines Corporation | Web internet screen customizing system |
US6286043B1 (en) * | 1998-08-26 | 2001-09-04 | International Business Machines Corp. | User profile management in the presence of dynamic pages using content templates |
Non-Patent Citations (4)
Title |
---|
"Customer portals: the value framework", 23 January 2002 (2002-01-23), pages 1 - 6, XP002961227, Retrieved from the Internet <URL:www.delphigroup.com> * |
"Employee self-service and life and work events for a new user centric perspective", SAP DESIGN GUILD, 9 February 2001 (2001-02-09), pages 1 - 7, XP002961229, Retrieved from the Internet <URL:www.sapdesignguild.org> * |
"Introducing WebLogic portal", 11 February 2001 (2001-02-11), pages 1 - 11, XP002961228, Retrieved from the Internet <URL:www.edocs.bea.com> * |
See also references of EP1570364A4 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009001137A1 (en) * | 2007-06-28 | 2008-12-31 | Taptu Ltd | Interactive web scraping of online content for search and display on mobile devices |
GB2462564A (en) * | 2007-06-28 | 2010-02-17 | Taptu Ltd | Interactive web scraping of onine content for search and display on mobile dev ices |
US9176956B2 (en) | 2008-04-07 | 2015-11-03 | Lg Electronics Inc. | Apparatus and method for providing search screen |
Also Published As
Publication number | Publication date |
---|---|
EP1570364A2 (en) | 2005-09-07 |
CA2505837A1 (en) | 2004-06-03 |
KR20050084999A (en) | 2005-08-29 |
WO2004046945A8 (en) | 2004-08-05 |
AU2002361740A1 (en) | 2004-06-15 |
EP1570364A4 (en) | 2008-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20040098467A1 (en) | Methods and systems for implementing a customized life portal | |
US20040098451A1 (en) | Method and system for modifying web content for display in a life portal | |
US20040098360A1 (en) | Customized life portal | |
US8527504B1 (en) | Data network content filtering using categorized filtering parameters | |
US7353246B1 (en) | System and method for enabling information associations | |
US6434563B1 (en) | WWW browser configured to provide a windowed content manifestation environment | |
US7974832B2 (en) | Web translation provider | |
JP5044652B2 (en) | Tool bar service providing method and apparatus | |
CN1128417C (en) | Configurable disablement of display objects in browser | |
US8769413B2 (en) | System, method and computer program product for a multifunction toolbar for internet browsers | |
US6907423B2 (en) | Search engine interface and method of controlling client searches | |
US6081829A (en) | General purpose web annotations without modifying browser | |
EP1008104B1 (en) | Drag and drop based browsing interface | |
US7089246B1 (en) | Overriding content ratings and restricting access to requested resources | |
US7636777B1 (en) | Restricting access to requested resources | |
US6101529A (en) | Apparatus for updating wallpaper for computer display | |
US20050257128A1 (en) | WWW browser configured to provide a windowed content manifestation environment | |
US20010029527A1 (en) | Method and system for providing a customized browser network | |
US20040103090A1 (en) | Document search and analyzing method and apparatus | |
US20020124022A1 (en) | Method and apparatus for processing web documents using multi-browse function | |
US20090077468A1 (en) | Method of switching internet personas based on url | |
EP0848339B1 (en) | Server with automatic loading of menus | |
EP2761506B1 (en) | Historical browsing session management | |
KR20030094261A (en) | System and Method For Personalized Presentation Of Web Pages | |
EP2399209A1 (en) | Content access platform and methods and apparatus providing access to internet content for heterogeneous devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
CFP | Corrected version of a pamphlet front page |
Free format text: UNDER (57) PUBLISHED ABSTRACT REPLACED BY CORRECT ABSTRACT |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2505837 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002361740 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020057008758 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002797377 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020057008758 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2002797377 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |