WO2005089403A2 - Methods and apparatus for navigating an image - Google Patents

Methods and apparatus for navigating an image Download PDF

Info

Publication number
WO2005089403A2
WO2005089403A2 PCT/US2005/008812 US2005008812W WO2005089403A2 WO 2005089403 A2 WO2005089403 A2 WO 2005089403A2 US 2005008812 W US2005008812 W US 2005008812W WO 2005089403 A2 WO2005089403 A2 WO 2005089403A2
Authority
WO
WIPO (PCT)
Prior art keywords
elements
zoom level
image
linear size
pixel
Prior art date
Application number
PCT/US2005/008812
Other languages
French (fr)
Other versions
WO2005089403A3 (en
Inventor
Blaise Aguera Y Arcas
Original Assignee
Seadragon Software, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=34994319&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2005089403(A2) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Priority claimed from US10/803,010 external-priority patent/US7133054B2/en
Application filed by Seadragon Software, Inc. filed Critical Seadragon Software, Inc.
Priority to CA2558833A priority Critical patent/CA2558833C/en
Priority to JP2007504079A priority patent/JP4861978B2/en
Priority to EP05740967A priority patent/EP1759354A2/en
Publication of WO2005089403A2 publication Critical patent/WO2005089403A2/en
Publication of WO2005089403A3 publication Critical patent/WO2005089403A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B29/00Maps; Plans; Charts; Diagrams, e.g. route diagram
    • G09B29/003Maps
    • G09B29/006Representation of non-cartographic information on maps, e.g. population distribution, wind direction, radiation levels, air and sea routes
    • G09B29/007Representation of non-cartographic information on maps, e.g. population distribution, wind direction, radiation levels, air and sea routes using computer methods
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B29/00Maps; Plans; Charts; Diagrams, e.g. route diagram
    • G09B29/10Map spot or coordinate position indicators; Map reading aids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/048Indexing scheme relating to G06F3/048
    • G06F2203/04806Zoom, i.e. interaction techniques or interactors for controlling the zooming operation

Definitions

  • the present invention relates to methods and apparatus for navigating, such as zooming and panning, over an image of an object in such a way as to provide the appearance of smooth, continuous navigational movement.
  • GUIs graphical computer user interfaces
  • visual components may be represented and manipulated such that they do not have a fixed spatial scale on the display; indeed, the visual components may be panned and/or zoomed in or out.
  • the ability to zoom in and out on an image is desirable in connection with, for example, viewing maps, browsing through text layouts such as newspapers, viewing digital photographs, viewing blueprints or diagrams, and viewing other large data sets.
  • zoomable components are a peripheral aspect of a user's interaction with the software and the zooming feature is only employed occasionally.
  • These computer applications permit a user to pan over an image smoothly and continuously (e.g., utilizing scroll bars or the cursor to translate the viewed image left, right, up or down) .
  • a significant problem with such computer applications is that they do not permit a user to zoom smoothly and continuously. Indeed, they provide zooming in discrete steps, such as 10%, 25%, 50%, 75%, 100%, 150%, 200%, 500%, etc.
  • the user selects the desired zoom using the cursor and, in response, the image changes abruptly to the selected zoom level.
  • the undesirable qualities of discontinuous zooming also exist in Internet-based computer applications.
  • the computer application underlying the www.mapguest . com website illustrates this point.
  • the MapQuest website permits a user to enter one or more addresses and receive an image of a roadmap £ ⁇ v response.
  • FIGS. 1-4 are examples of images that one may obtain from the MapQuest website in response to a query for a regional map of Long Island, NY, U.S.A.
  • the MapQuest website permits the user to zoom in and zoom out to discrete levels, such as 10 levels.
  • FIG. 1 is a rendition at zoom level 5, which is approximately 100 meters/pixel.
  • FIG. 2 is an image at a zoom level 6, which is about 35 meters/pixel.
  • FIG. 3 is an image at a zoom level 7, which is about 20 meters/pixel.
  • FIG. 4 is an image at a zoom level 9, which is about 10 meters/pixel.
  • the abrupt transitions between zoom levels result in a sudden and abrupt loss of detail when zooming out and a sudden and abrupt addition of detail when zooming in.
  • no local, secondary or connecting roads may be seen in FIG. 1 (at zoom level 5) , although secondary and connecting roads suddenly appear in FIG. 2, which is the very next zoom level.
  • Such abrupt discontinuities are very displeasing when utilizing the MapQuest website.
  • Al roads may be about 16 meters wide, A2 roads may be abo ⁇ t 12 meters wide, A3 roads may be about 8 meters wide, A4 roads may be about 5 meters wide, and A5 roads may be about 2.5 meters wide.
  • the MapQuest computer application deals with these varying levels of coarseness by displaying only the road categories deemed appropriate at a particular zoom level. For example, a nation-wide view might only show Al roads, while a state-wide view might show Al and A2 roads, and a county-wide view might show Al, A2 and A3 roads . Even if MapQuest were modified to allow continuous zooming of the roadmap, this approach would lead to the sudden appearance and disappearance of road categories during zooming, which is confusing and visually displeasing. In view of the foregoing, there are needs in the art for new methods and apparatus for navigating images of complex objects, which permit smooth and continuous zooming of the image while also preserving visual distinctions between the elements of the objects based on their size or importance.
  • methods and apparatus are contemplated to perform various actions, including: zooming into or out of an image having at least one object, wherein at least some elements of at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming.
  • the scale power a ⁇ is not equal to -1 (typically -1 ⁇ a ⁇ 0) within a range of zoom levels -..0 and zl, where zO is of a lower physical linear size/pixel than zl .
  • At least one of zO and zl may vary for one or more elements of the object. It is noted that a, c and d may also vary from element to element . At least some elements of the at least one obj ect may also be scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming.
  • the invention may also be embodied in a software program for storage in a suitable storage medium and execution by a processing unit .
  • the elements of the obj ect may be of varying degrees of coarseness .
  • the coarseness of the elements of a roadmap object manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably more A2 roads than Al roads.
  • Degree of coarseness in road categories also manifests in such properties as average road length, frequency of intersections , and maximum curvature .
  • the coarseness of the elements of other image objects may manifest in other ways too numerous to list in their entirety.
  • the scaling of the elements in a given predetermined, image may be physically proportional or non-physically proportional based on at least one of : (i) a degree of coarseness of such elements ; and (ii) the zoom level of the given predetermined image.
  • the object may be a roadmap
  • the elements of the object may be roads
  • the varying degrees of coarseness may be road hierarchies.
  • the scaling of a given road in a given predetermined image may be physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level of the given predetermined image.
  • methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of a roadmap; receiving one or more user navigation commands including zooming information at the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.
  • methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of at least one object, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving one or more user navigation commands including zooming information at the client terminal; blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal.
  • methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of a roadmap to a client tfc-.gov; J .nal o ⁇ esc a rommunications channel; receiving the plurality of pre—rendered images at thc ; ' client terminal; issuing one or more user navigation commands inclu ⁇ ing zooming information using the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.
  • methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of at least one object to a client terminal over a communications channel, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pire-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving the plurality of pre-rendered images at the client terminal; issuing one or more user navigation commands including zooming information using the client terminal; blending two of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal .
  • FIG. 1 is an image taken from the MapQuest website, which is at a zoom level 5;
  • FIG. 2 is an image taken from the M'ap.Qe.es website, which is at a zoom level 6;
  • FIG. 3 is an image taken from the MapQuest website, which is at a zoom level 7;
  • FIG. 4 is an image taken from the MapQuest website, which is at a zoom level 9;
  • FIG. 5 is an image of Long Island produced at a zoom level of about 334 meters/pixel in accordance with one or more aspects of the present invention;
  • FIG. 6 is an image of Long Island produced at a zoom level of about 191 meters/pixel in accordance with one or more further aspects of the present invention
  • FIG. 7 is an image of Long Island produced at a zoom level of about 109.2 meters/pixel in accordance with one or more further aspects of the present invention
  • FIG. 8 is an image of Long Island produced at a zoom level of about 62.4 meters/pixel in accordance with one or more further aspects of the present invention
  • FIG. 9 is an image of Long Island produced at a zoom level of about 35.7 meters/pixel in accordance with one or more further aspects of the present invention
  • FIG. 10 is an image of Long Island produced at a zoom level of about 20.4 meters/pixel in accordance with one or more further aspects of the present invention
  • FIG. 11 is an image of Long Island produced at a zoom level of about 11.7 meters/pixel in accordance with one or more further aspects of the present invention
  • FIG. 12 is a flow diagram illustrating process steps that may be carried out in order to provide smooth and continuous navigation of an image in accordance with one or more aspects of the present invention
  • FIG. 13 is a flow diagram illustrating further process steps that may be carried out in order to smoothly navigate an image in accordance with various aspects of the present invention
  • FIG. 14 is a log-log graph of a line width in pixels versus a zoom level in meters/pixel illustrating physical and non-physical scalinef in accordance with one or. r ⁇ a ⁇ r further aspects of the present inveafe sa; arrcf FIG.
  • FIG. 15 is a log-log graph illustrating variations in the physical and non-physical scaling of FIG. 14.
  • FIGS. 16A-D illustrate respective antialiased vertical lines whose endpoints are precisely centered on pixel coordinates;
  • FIGS. 17A-C illustrate respective antialiased lines on a slant, with endpoints not positioned to fall at exact pixel coordinates;
  • FIG. 18 is the log-log graph of line width versus zoom level of FIG. 14 including horizontal lines indicating incremental line widths, and vertical lines spaced such that the line width over the interval between two adjacent vertical lines changes by no more than two pixels.
  • FIGS. 5-11 a series of images representing the road system of Long Island, NY, U.S.A. where each image is at a different zoom level (or resolution) .
  • zoom level or resolution
  • the extent of images and implementations for which the present invention may be employed are too numerous to list in their entirety.
  • the features of the present invention may be used to navigate images of the human anatomy, complex topographies, engineering diagrams such as wiring diagrams or blueprints, gene ontologies, etc. It has been found, however, that the invention has particular applicability to navigating images in which the elements thereof are of varying levels of detail or coarseness. Therefore, for the purposes of brevity and clarity, the various aspects of -Te present in.vrer.t .Qn: will . be discussed in connection with a specific exam le, namely, images !f a roadmap.
  • the image 100A of the roadmap illustrated in FIG. 5 is at a zoom level that may be characterized by units of physical length/pixel (or physical linear size/pixel) .
  • the zoom level, z represents the actual physical linear size that a single pixel of the image 100A represents.
  • the zoom level is about 334 meters/pixel.
  • FIG. 6 is an image 100B of the same roadmap as FIG.
  • FIGS. 5-11 Another significant feature of the present invention as illustrated in FIGS. 5-11 is that little or no detail abruptly appears or disappears when zooming from one level to another level.
  • the roadmap includes elements (i.e., roads) of varying degrees of coarseness.
  • FIG. 8 5 includes at least Al highways such as 102, A3 -secondary roads such as 104, and A4 local roads such as 106. Yet these details, even the A4 local roads 106, may still be seen in image 100A of FIG. 5, which is substantially zoomed out in comparison with the image 100D of FIG. 8.
  • the Al, A2, A3, and A4 roads may be distinguished from one another. Even differences between Al primary highways 102 and A2 primary roads 108 may be distinguished from one another vis-a-vis the relative
  • FIGS. 12-13 are flow diagrams illustrating process steps that are preferably carried out by the one or more computing devices and/or related equipment.
  • the process flow is carried out by commercially available computing equipment (such as Pentium-based computers)
  • any of a number of other techniques may be employed to carry out the process steps without departing from the spirit and scope of the present invention as claimed.
  • the hardware employed may be implemented utilizing any other known or hereinafter developed technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, one or more programmable digital devices or systems, such as programmable read only memories (PROMs) , programmable array logic devices (PALs) , any combination of the above, etc.
  • the methods of the present invention may be embodied in a software program that may be stored on any of the known or hereinafter developed media.
  • FIG. 12 illustrates an embodiment of the invention in which a plurality of images are prepared (each at a different zoom level or resolution) , action 200, and two or more of the images are blended together to achieve the appearance of smooth navigation, such as zooming (action 206) .
  • a service provider would expend the resources to prepare a plurality of pre-rendered images (action 200) and make the images available to a user's client terminal over a communications channel, such as the Internet (action 202) .
  • the pre—rei-Edrred images may be -an integral or related part of an a-jsp ication program that the user loads and executes on his ⁇ _ ⁇ her computer. It has been found through experimentation that, when the blending approach is used, a set of images at the following zoom levels work well when the image object is a roadmap; 30 meters/pixel, 50 meters/pixel, 75 meters/pixel, 100 meters/pixel, 200 meters/pixel, 300 meters/pixel, 500 meters/pixel, 1000 meters/pixel, and 3000 meters/pixel. It is noted, however, that any number of images may be employed at any number of resolutions without departing from the scope of the invention.
  • the client terminal in response to user-initiated navigation commands (action 204), such as zooming commands, the client terminal is preferably operable to blend two or more images in order to produce an intermediate resolution image that coincides with the navigation command (action 206) .
  • This blending may be accomplished by a number of methods, such as the well—known trilinear interpolation technique described by Lance Williams, Pyramidal Parametrics, Computer Graphics, Proc. SIGGRAPH ⁇ 83, 17(3): 1-11 (1983), the entire disclosure of which is incorporated herein by reference.
  • the present invention does not require or depend on any particular one of these blending methods.
  • the user may wish to navigate to a zoom level of 62.4 meters/pixel.
  • this zoom level may be between two of the pre-rendered images (e.g., in this example between zoom level 50 meters/pixel and zoom level 75 meters/pixel)
  • the desired zoom level of 62.4 meters/pixel may be achieved using the trilinear interpolation technique.
  • any zoom level between 50 meters/pixel and 75 meters/pixel may be obtained utilizing a blending method as described above, which if per r-s ed quickly enough provides the/ ,,insurance of smot th and continuoiug m d remedyqa.t ⁇ &xi- . '8fe®-.
  • I ns& sgr technique may be carried through to other zoom levels, such as the 35.7 meters/pixel level illustrated in FIG. 9.
  • the blending technique may be performed as between the pre-rendered images of 30 meters/pixel and 50 meters/pixel of the example discussed thus far.
  • the above blending approach may be used when the computing power of the processing unit on which the invention is carried out is not high enough to (i) perform the rendering operation in the first instance, and/or (ii) perform image rendering "just-in-time” or “on the fly” (for example, in real time) to achieve a higti image frame rate for smooth navigation.
  • image rendering "just-in-time” or "on the fly” (for example, in real time) to achieve a higti image frame rate for smooth navigation.
  • FIG. 13 illustrates the detailed steps and/or actions that are preferably conducted to prepare one or more images in accordance with the present invention.
  • the information is obtained regarding the image object or objects using any of the known or hereinafter developed techniques.
  • image objects have been modeled using appropriate primitives, such as polygons, lines, points, etc.
  • appropriate primitives such as polygons, lines, points, etc.
  • UDM Universal Transverse Mercator
  • the model is usually in the form of a list of line segments (in any coordinate system) that comprise the roads in the zone.
  • the list may be converted into an image in the spatial domain (a pixel image) using any of the known or hereinafter developed rendering processes so long as it incorporates certain techniques for determining the weight (e.g., apparent or real thickness) of a given primitive in the pixel (spatial) domain.
  • the rendering processes should incorporate certain techniques for determining the weight of the lines that model the roads of the roadmap in the spatial domain. These techniques will be discussed below.
  • the elements of ' he object are classified.
  • the classifica.fci-on may take the form of recognizing already existing categories, namely, Al, A2, A3, A4, and A5. Indeed, these road elements have varying degrees of coarseness and, as will be discussed below, may be rendered differently based on this classification.
  • mathematical scaling is applied to the different road elements based on the zoom level. As will be discussed in more detail below, the mathematical scaling may also vary based on the element classification.
  • the pre-set pixel width approach dictates that every road is a certain pixel width, such as one pixel in width on the display.
  • Major roads such as highways, may be emphasized by making them two pixels wide, etc.
  • this approach makes the visual density of the map change as one zooms in and out. At some level of zoom, the result might be pleasing, e.g., at a small-size county level. As one zooms in, however, roads would not thicken, making the map look overly sparse. Further, as one zooms out, roads would run into each other, rapidly forming a solid nest in which individual roads would be indistinguishable.
  • the images are produced in such a way that at least some image elements are scaled up and/or down either (i) physically proportional to the zoom level; or (ii) i non-physically J -_ ⁇ ortional « to hat zoo level, depending on parameters that will be c-xscussed in more detail below. It is noted th ' ⁇ . - the scaling being "physically proportional to the zoom level" means that the number of pixels representing the road width increases or decreases with the zoom level as the 10 size of an element would appear to change with its distance from the human eye.
  • zooming in is equivalent to moving an object closer to the viewer, and zooming out is equivalent to moving the object farther away.
  • a may be set to a power law other than -1
  • d' may be set to a physical linear size i0 other than the actual physical linear size d.
  • non-physically proportional to the zoom level means that the road width in display pixels increases or decreases with the zoom 5 level in a way other than being physically proportional to the zoom level, i.e. a ⁇ -1.
  • the scaling is distorted in a way that achieves certain desirable results.
  • linear size means one-dimensional size. For example, if one considers any 2 dimensional object and doubles
  • the linear sizes X. die elements f an object may u ⁇ oT've l ngtii,- m: ttlliy, ⁇ ius, diameter, and/or any other measurement that one can read off with a ruler on the Euclidean plane.
  • the thickness of a line, the len-gth of a line, the diameter of a circle or disc, the length of one side of a polygon, and the distance between two points are a-11 examples of linear sizes. In this sense the "linear size" in two dimensions is the distance between two identified points of an object on a 2D Euclidean plane.
  • a ⁇ 0 will cause the rendered size of an element to decrease as one zooms out, and increase as one zooms in.
  • the rendered size of the element will decrease faster than it would with proportional physical scaling as one zooms out.
  • the size of the rendered element decreases more slowly than it would with, proportional physical scaling as one zooms out.
  • p(z) for a given length of a given object, is permitted to be substantially continuous so that during navigation the user does not experience a sudden jump or discontinuity in the size of an element of the image (as opposed to the conventional approaches that permit the most extreme discontinuity - a sudclen appearance or disappearance of an element during navigation) .
  • p(z) monotonically decrease with zooming out such that zooming out causes the elements of the object become smaller (e.g., roads to become thinner), and such that zooming in causes the elements of the object become larger. This gives the user a sense of physicality about the object (s) of the image.
  • the scaling of the road widths may be physically proportional to the zoom level when zoomed in (e.g., up to about 0.5 meters/pixel); (ii) that the scaling of the road widths may be non-physically proportional to the zoom level when zoomed out (e.g., above about 0.5 meters/pixel); and (iii) that the scaling of the road widths may be physically proportional to the zoom level when zoomed further out (e.g., above about 50 meters/pixel or higher depending on parameters which will be discussed in more detail below) .
  • a -1.
  • zO 0.5 meters/pixel, or 2 pixels/meter, which when expressed as a map scale on a 15 inch display (with 1600x1200 pixel resolution) corresponds to a scale of about 1:2600.
  • d 16 meters, which is a reasonable real physical width for Al roads, the rendered road will appear to be its actual size when one is zoomed in (0.5 meters/pixel or less) .
  • the rendered line is about 160 pixels wide.
  • this permits the Al road to remain visible (and distinguishable from other smaller roads) as one zooms out.
  • the width of the rendered line using physical scaling would have been about 0.005 pixels at a zoom level of about 3300 meters/pixel, rendering it virtually invisible.
  • the width of the rendered line is about 0.8 pixels at a zoom level of 3300 meters/pixel, rendering it clearly visible.
  • the value for zl is chosen to be the most zoomed-out scale at which a given road still has "greater than physical" importance.
  • the resolution would be approximately 3300 meters/pixel or 3.3 kilometers/pixel. If one looks at the entire world, then there may be no reason for U.S. highways to assume enhanced importance relative to the view of the country alone.
  • the scaling of the road widths is again physically proportional to the zoom level, but preferably with a large d' (much greater than the real width d) for continuity of p(z).
  • a new imputed physical width of the Al highway is chosen, for example.,, f ⁇ 1.65 ilometre! s . zl and the new valira.- f ⁇ w d r are : preii ⁇ F£a.biy chosen in such a way that, at the outer scale zl, the rendered width of the line will be a reasonable number of pixels.
  • Al roads may be about pixel wide, which is thin but still clearly visible; this corresponds to an imputed physical road width of 1650 meters, or 1.65 kilometers.
  • p(z) has six parameters: zO, zl, dO, dl, d2 and a.
  • zO and zl mark the scales at which the behavior of ⁇ (z) changes.
  • zooming is physical (i.e., the exponent of z is -1), with a physical width off dO, which preferably corresponds to the real physical width d.
  • zooming is again physical, but with a physical width of dl, which in general does not correspond to d.
  • the rendered line width scales with a powe r law of a, which can be a value other than -1.
  • a powe r law
  • dO 8 meters
  • zO 0.5 meters/pixel
  • zl 50 meters/pixel
  • d2 100 meters.
  • the dotted lines all have a slope of -1 and represent physical scaling at different physical widths. From the top down, the corresponding physical widths of these dotted lines are: 1.65 kilometers, 312 meters, 100 meters, 20 meters, 16 meters, 12 meters, 8 meters, 5 meters, and 2.5 meters.
  • interpolation between a plurality of pre-rendered images it is possible in many cases to ensure that the resulting interpolation is humanly indistinguishable or nearly indistinguishable from an ideal rendition of all lines or other primitive geometric elements at their correct pixel widths as determined by the physical and non-physical scaling equations.
  • this approach is designed to ensure that the line integral of the intensity function (or "1-intensity" function, for black lines on a white background) over a perpendicular to the line drawn is equal to the line width.
  • This method generalizes readily to lines whose endpoints do not lie precisely in the centers of pixels, to lines which are in other orientations than vertical, and to curves. Note that drawing the antialiased vertical lines of FIGS.
  • 16A-D could also be accomplished by alpha-blending two images, one (image A) in which the line is 1 pixel wide, and the other (image
  • FIGS. 17A-C a 1 pixel wide line (FIG. 17A) , a 2 pixel wide line (FIG. 17B) and a 3 pixel wide line (FIG. 17C) are illustrated in an arbitrary orientation.
  • the same principle applies to the arbitrary orientation of FIGS. 17A-C as to the case where the lines are aligned exactly to the pixel grid, although the spacing of the line widths between which to alpha-blend may need to be finer than two pixels for good results.
  • FIG. 18 is substantially similar to FIG. 14 except that FIG. 18 includes a set of horizontal lines and vertical lines.
  • the horizontal lines indicate line widths between 1 and 10 pixels, in increments of one pixel.
  • the vertical lines are spaced such that line width over the interval between two adjacent vertical lines changes by no more than two pixels.
  • the vertical lines represent a set of zoom values suitable for pre-rendition, wherein alpha-blending between two adjacent such pre-rendered images will produce characteristics nearly equivalent to rendering the lines representing roads at continuously variable widths.
  • the present invention may be employed by an Internet website that provides maps and driving directions to client terminals in response to user requests.
  • various aspects of the invention may be employed in a GPS navigation system in an automobile.
  • the invention may also be incorporated into medical imaging equipment, whereby detailed information concerning, for example, a patient's circulatory system, nervous system, etc. may be rendered and navigated as discussed hereinabove.
  • the applications of the invention are too numerous to list in their entirety, yet a skilled artisan will recognize that they are contemplated herein and fall within the scope of the invention as claimed.
  • the present invention may also be utilized in connection with other applications in which the rendered images provide a means for advertising and otherwise advancing commerce. Additional details concerning these aspects and uses of the present invention may be found in U.S.
  • the present invention relates generally to graphical zooming user interfaces (ZUI) for computers. More specifically, the invention is a system and method for progressively rendering zoomable visual content in a manner that is both computationally efficient, resulting in good user responsiveness and interactive frame rates, and exact, in the sense that vector drawings, text, and other non-photographic content is ultimately drawn without the resampling which would normally lead to degradation in image quality, and without interpolation of other images, which would also lead to degradation.
  • ZUI graphical zooming user interfaces
  • GUIs graphical computer user interfaces
  • visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
  • the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • viewing maps browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • Even when viewing ordinary documents, such as spreadsheets and reports it is often useful to be able to glance at a document overview, and then zoom in on an area of interest.
  • zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
  • these applications allow zooming in and out of documents, but not necessarily zooming in and out ofthe visual components ofthe applications themselves. Further, zooming is normally a peripheral aspect ofthe user's interaction with the software, and the zoom setting is only modified occasionally.
  • continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom and pan continuously in a user-friendly manner is absent from prior art systems.
  • a display is the device or devices used to output rendered imagery to the user.
  • a frame buffer is used to dynamically represent the contents of at least a portion ofthe display.
  • Display refresh rate is the rate at which the physical display, or portion thereof, is refreshed using the contents ofthe frame buffer.
  • a frame buffer's frame rate is the rate at which the frame buffer is updated.
  • the display refresh rate is 60- 90 Hz.
  • Most digital video for example, has a frame rate of 24-30 Hz.
  • each frame of digital video will actually be displayed at least twice as the display is refreshed.
  • Plural frame buffers may be utilized at different frame rates and thus be displayed substantially simultaneously on the same display. This would occur, for example, when two digital videos with different frame rates were being played on the same display, in different windows.
  • ZUI zooming user interfaces
  • LOD pyramid The complete set of LODs, organized conceptually as a stack of images of decreasing resolution, is termed the LOD pyramid — see Fig. 1.
  • LOD pyramid The complete set of LODs, organized conceptually as a stack of images of decreasing resolution, is termed the LOD pyramid — see Fig. 1.
  • the system interpolates between the LODs and displays a resulting image at a desired resolution. While this approach solves the computational issue, it displays a final compromised image that is often blurred and unrealistic, and often involves loss of information due to the fact that it represents interpolation of different LODs. These interpolation errors are especially noticeable when the user stops zooming and has the opportunity to view a still image at a chosen resolution which does not precisely match the resolution of any ofthe LODs.
  • vector data typically treats vector data in the same way as photographic or image data.
  • Vector data such as blueprints or line drawings, are displayed by processing a set of abstract instructions using a rendering algorithm, which can render lines, curves and other primitive shapes at any desired resolution.
  • Text rendered using scalable fonts is an important special case of vector data.
  • Image or photographic data (including text rendered using bitmapped fonts) are not so generated, but must be displayed either by interpolation between precomputed LODs or by resampling an original image. We refer to the latter herein as nonvector data.
  • a further object ofthe present invention is to allow the user to zoom arbitrarily far in on vector content while maintaining a crisp, unblurred view ofthe content and maintaining interactive frame rates.
  • a further object ofthe present invention is to allow the user to zoom arbitrarily far out to get an overview of complex vectorial content, while both preserving the overall appearance ofthe content and maintaining interactive frame rates.
  • a further object ofthe present invention is to diminish the user's perception of transitions between LODs or rendition qualities during interaction.
  • a further object ofthe present invention is to allow the graceful degradation of image quality by blurring when information ordinarily needed to render portions ofthe image is as yet incomplete.
  • a further object ofthe present invention is to gradually increase image quality by bringing it into sharper focus as more complete information needed to render portions ofthe image becomes available.
  • the desired resolution is either greater than the resolution ofthe LOD with the highest available resolution or less than the resolution ofthe LOD with the lowest resolution, then there will be only a single "surrounding LOD".
  • the dynamic interpolation of an image at a desired resolution based on a set of precomputed LODs is termed in the literature mipmapping or trilinear interpolation. The latter term further indicates that bilinear sampling is used to resample the surrounding LODs, followed by linear interpolation between these resampled LODs (hence trilinear). See, e.g.; Lance Williams. "Pyramidal Parametrics," Computer Graphics (Proc. SIGGRAPH '83) 17(3): 1-11 (1983).
  • the final image is then displayed by preferably first displaying an intermediate final image.
  • the intermediate final image is the first image displayed at the desired resolution before that image is refined as described hereafter.
  • the intermediate final image may correspond to the image that would be displayed at the desired resolution using the prior art.
  • the transition from the intermediate final image to the final image may be gradual, as explained in more detail below.
  • the present invention allows LODs to be spaced in any resolution increments, including irrational increments (i.e. magnification or minification factors between consecutive LODs which cannot be expressed as the ratio of two integers), as explained in more detail below.
  • irrational increments i.e. magnification or minification factors between consecutive LODs which cannot be expressed as the ratio of two integers
  • portions ofthe image at each different LOD are denoted tiles, and such tiles are rendered in an order that minimizes any perceived imperfections to a viewer.
  • the displayed visual content is made up of plural LODs (potentially a superset ofthe surrounding LODs as described above), each of which is displayed in the proper proportion and location in order to cause the display to gradually fade into the final image in a manner that conceals imperfections.
  • the present invention involves a hybrid strategy, in which an image is displayed using predefined LODs during rapid zooming and panning, but when the view stabilizes sufficiently, an exact LOD is rendered and displayed.
  • the exact LOD is rendered and displayed at the precise resolution chosen by the user, which is normally different from the predefined LODs. Because the human visual system is insensitive to fine detail in the visual content while it is still in motion, this hybrid strategy can produce the illusion of continuous "perfect rendering" with far less computation.
  • Figure 1 depicts an LOD pyramid (in this case the base of the pyramid, representing the highest-resolution representation, is a 512x512 sample image, and successive minifications of this image are shown in factors of 2);
  • Figure 2 depicts a flow chart for use in an exemplary embodiment ofthe invention
  • Figure 3 is another flow chart that shows how the system displays the final image after zooming
  • Figure 4 is the LOD pyramid of Figure 1 with grid lines added showing the subdivision of each LOD into rectangular tiles of equal size in samples;
  • Figure 5 is another flow chart, for use in connection with the present invention, and it depicts a process for displaying rendered tiles on a display
  • Figure 6 shows a concept termed irrational tiling, explained in more detail herein;
  • Figure 7 depicts a composite tile and the tiles that make up the composite tile, as explained more fully below.
  • Figure 2 shows a flowchart of a basic technique for implementation of the present invention.
  • the flowchart of Figure 2 represents an exemplary embodiment of the invention and would begin executing when an image is displayed at an initial resolution.
  • the invention may be used in the client server model, but that the client and server may be on the same or different machines.
  • the actual hardware platform and system utilized are not critical to the present invention.
  • the flowchart is entered at start block 201 with an initial view of an image at a particular resolution. In this example, the image is taken to be static.
  • the image is displayed at block 202.
  • a user may navigate that image by moving, for example, a computer mouse.
  • the initial view displayed at block 202 will change when the user navigates the image.
  • the underlying image may itself be dynamic, such as in the case of motion video, however, for purposes of this example, the image itself is treated as static.
  • any image to be displayed may also have textual or other vector data and/or nonvector data such as photographs and other images.
  • the present invention, and the entire discussion below, is applicable regardless of whether the image comprises vector or nonvector data, or both.
  • the method transfers control to decision point 203 at which navigation input may be detected.
  • Decision point 203 may be implemented by a continuous loop in software looking for a particular signal that detects movement, an interrupt system in hardware, or any other desired methodology.
  • the particular technique utilized to detect and analyze the navigation request is not critical to the present invention. Regardless of the methodology used, the system can detect the request, thus indicating a desire to navigate the image.
  • Such transformations may include, for example, three dimensional translation and rotation, application of an image filter, local stretching, dynamic spatial distortion applied to selected areas of the image, or any other kind of distortion that might reveal more information.
  • Another example would be a virtual magnifying glass, that can get moved over the image and which magnifies parts of the image under the virtual magnifying glass.
  • the selected LODs may be those two LODs that "surround" the desired resolution; i.e.; the resolution of the new view.
  • the interpolation in prior systems, constantly occurs as the user zooms and is thus often implemented directly in the hardware to achieve speed.
  • the combination of detection of movement in decision point 205 and a substantially immediate display of an appropriate inte ⁇ olated image at block 204 results in the image appearing to zoom continuously as the user navigates. During zooming in or out, since the image is moving, an interpolated image is sufficient to look realistic and clear. Any interpolation error is only minimally detectable by the human visual system, as such errors are disguised by the constantly changing view ofthe image. [0037]
  • the system tests whether or not the movement has substantially ceased.
  • the methodology ascertains whether or not the user has arrived at the point where he has finished zooming.
  • control is transferred to block 206, where an exact image is rendered, after which control returns to block 203.
  • the system will eventually display an exact LOD.
  • the display is not simply rendered and displayed by an interpolation of two predefined LODs, but may be rendered and displayed by re- rendering vector data using the original algorithm used to render the text or other vector data when the initial view was displayed at block 202.
  • Nonvector data may also be resampled for rendering and displayed at the exact required LOD.
  • the required re- rendering or resampling may be performed not only at the precise resolution required for display at the desired resolution, but also on a sampling grid corresponding precisely to the correct positions ofthe display pixels relative to the underlying content, as calculated based on the desired view.
  • translation of the image on the display by ' a pixel in the display plane does not change the required resolution, but it does alter the sampling grid, and therefore requires re-rendering or resampling ofthe exact LOD.
  • the foregoing system of Fig. 2 represents a hybrid approach in which interpolation based upon predefined LODs is utilized while the view is changing (e.g.
  • the term render refers to the generation by the computer of a tile at a specific LOD based upon vector or nonvector data. With respect to nonvector data, these may be rerendered at an arbitrary resolution by resampling an original image at higher or lower resolution.
  • nonvector data these may be rerendered at an arbitrary resolution by resampling an original image at higher or lower resolution.
  • this interpolated image may be temporarily displayed after the navigation ceases the intermediate final image, or simply an intermediate image.
  • This image is generated from an interpolation ofthe surrounding LODs.
  • the intermediate image may be interpolated from more than two discrete LODs, or from two discrete LODs other than the ones that surround the desired resolution.
  • block 304 is entered, which causes the image to begin to gradually fade towards an exact rendition of the image, which we term the final image.
  • the final image differs from the intermediate image in that the final image may not involve interpolation of any predefined LODs. Instead, the final image, or portions thereof, may comprise newly rendered tiles.
  • the newly rendered tiles may result from resampling the original data, and in the case of vector data, the newly rendered tiles may result from rasterization at the desired resolution.
  • step 304 is executed so the changeover from the intermediate final image to the final image is done gradually and smoothly. This gradual fading, sometimes called blending, causes the image to come into focus gradually when navigation ceases, producing an effect similar to automatic focusing in cameras or other optical instruments. The illusion of physicality created by this effect is an important aspect ofthe present invention.
  • a first LOD may take a 1 inch by 1 inch area of a viewable object and generate a single 32 by 32 sample tile.
  • the information may also be rendered by taking the same 1 inch by 1 inch area and representing it as a tile that is 64 by 64 samples, and therefore at a higher resolution.
  • irrational tiling Tiling granularity, which we will write as the variable g, is defined as the ratio ofthe linear tiling grid size at a higher- resolution LOD to the linear tiling grid size at the next lower-resolution LOD.
  • g 2
  • This same value of g has been used in other prior art.
  • LODs may be subdivided into tiles in any fashion, in an exemplary embodiment each LOD is subdivided into a grid of square or rectangular tiles containing a constant number of samples (except, as required, at the edges of the visual content).
  • zooming in on any point will therefore produce a quasi-random stream of requests for 1, 2 or 4 tiles, and performance will be on average uniform when zooming in everywhere.
  • irrational tiling emerges in connection with panning after a deep zoom. When the user pans the image after having zoomed in deeply, at some point a grid line will be moved onto the display.
  • Figure 6(b) illustrates the advantage gained by irrational tiling granularity.
  • Figure 6 shows cross-sections through several LODs of the visual content; each bar represents a cross-section of a rectangular tile.
  • the curves 601, drawn from top to bottom represent the bounds ofthe visible area ofthe visual content at the relevant LOD during a zooming operation: as the resolution is increased (zooming in to reveal more detail), the area under examination decreases.
  • Darker bars (e.g., 602) represent tiles which have already been rendered over the course ofthe zoom.
  • An important aspect of the invention is the order in which the tiles are rendered. More particularly, the various tiles ofthe various LODs are optimally rendered such that all visible tiles are rendered first. Nonvisible tiles may not be rendered at all. Within the set of visible tiles, rendition proceeds in order of increasing resolution, so that tiles within low-resolution LODs are rendered first.
  • tiles are rendered in order of increasing distance from the center ofthe display, which we refer to as foveated rendering.
  • many sorting algorithms such as heapsort, quicksort, or others may be used.
  • a lexigraphic key may be used for sorting "requests" to render tiles, such that the outer subkey is visibility, the middle subkey is resolution in samples per physical unit, and the inner subkey is distance to the center ofthe display.
  • Other methods for ordering tile rendering requests may also be used.
  • the actual rendering ofthe tiles optimally takes place as a parallel process with the navigation and display described herein. When rendering and navigation/display proceed as parallel processes, user responsiveness may remain high even when tiles are slow to render.
  • a tile represents vector data, such as alphabetic typography in a stroke based font
  • rendering ofthe tile would involve running the algorithm to rasterize the alphabetic data and possibly transmitting that data to a client from a server.
  • the data fed to the rasterization algorithm could be sent to the client, and the client could run the algorithm to rasterize the tile.
  • rendering of a tile involving digitally sampled photographic data could involve resampling of that data to generate the tile at the appropriate LOD. For discrete LODs that are prestored, rendering may involve no more than simply transmitting the tile to a client computer for subsequent display.
  • the actual display may comprise different mixes of different tiles from different LODs.
  • any portion of the display could contain for example, 20% from LOD 1, 40% from LOD 2, and 40% from LOD 3.
  • the algorithm attempts to render tiles from the various LODs in a priority order best suited to supply the rendered tiles for display as they are most needed.
  • the actual display ofthe rendered tiles will be explained in more detail later with reference to Figure 5.
  • the algorithm is designed to make the best use of all rendered tiles, using high-resolution tiles in preference to lower-resolution tiles covering the same display area, yet using spatial blending to avoiding sharp boundaries between LODs, and temporally graduated blending weights to blend in higher detail if and when it becomes available (i.e. when higher-resolution tiles have been rendered).
  • this algorithm and variants thereof can result in more than two LODs being blended together at a given point on the display; it can also result in blending coefficients that vary smoothly over the display area; and it can result in blending coefficients that evolve in time even after the user has stopped navigating.
  • a composite tile area or simply a composite tile.
  • To define a composite tile we consider all of the LODs stacked on top of each other. Each LOD has its own tile grid. The composite grid is then formed by the projection of all of the grids from all ofthe LODs onto a single plane. The composite grid is then made up of various composite tiles of different sizes, defined by the boundaries of tiles from all of the different LODs. This is shown conceptually in Fig. 7. Fig. 7 depicts the tiles from three different LODs, 701 through 703, all representing the same image.
  • Fig. 7 shows that there would be a single "composite tile" 710.
  • the fraune rate may be typically greater than ten frames per second. Note that, as explained above, this frame rate is not necessarily the display refresh rate.
  • Fig. 5 depicts a flow chart of an algorithm for updating the frame buffer as tiles are rendered.
  • the arrangement of Fig. 5 is intended to op-erate on every composite tile in the displayed image each time the frame buffer is updated.
  • a frame duration is 1/20 of a second
  • each ofthe composite tiles on the entire screen would preferably be examined and updated during each 1/20 of a se&ond.
  • the composite tile may lack the relevant tiles in one or more LODs.
  • the process of Fig. 5 attempts to display each composite tile as a weighted average of all the available superimposed tiles within which the composite tile lies. Note that composite tiles are defined in such a way that they fall within exactly one tile at any given LOD; hence the weighted average can be expressed as a relative proportion of each LOD.
  • the process attempts to determine the appropriate weights for each LOD within the composite tile, and to vary those weights gradually over space and time to cause the image to gradually fade towards the final images discussed above.
  • the composite grid includes plural vertices which are defined to be any intersection or corner of gridlines in the composite grid. These are termed composite grid vertices.
  • the current weights at any particular time for each LOD at each vertex are maintained in memory.
  • the algorithm for updating vertex weights proceeds as described below.
  • Both of these variables are again numbers between 0.0 and 1.0, and are maintained for each vertex in the composite tiling.
  • the algorithm walks through each LOD in turn, in order from highest- resolution to lowest, performing the following operations. First 0.0 is assigned to levelOpacityGrid at all vertices. Then, for each rendered tile at that LOD (which may be a subset ofthe set of tiles at that LOD, if some have not yet been rendered), the algorithm updates the parts of the levelOpacityGrid touching that tile based on the tile's centerOpacity, cornerOpacity and edgeOpacity values: [0065] If the vertex is entirely in the interior ofthe tile, then it gets updated using centerOpacity.
  • the vertex is e.g. on the tile's left edge, it gets updated with the left edgeOpacity.
  • the vertex is e.g. on the top right corner, it gets updated with the top right cornerOpacity.
  • "Updating" means the following: if the pre-existing levelOpacityGrid value is greater than 0.0, then set the new value to the minimum of the present value, or the value it's being updated with. If the pre-existing value is zero (i.e. this vertex hasn't been touched yet) then just set the levelOpacityGrid value to the value it's being updated with.
  • the levelOpacityGrid at each vertex position gets set to the minimum nonzero value with which it gets updated.
  • the algorithm then walks through the levelOpacityGrid and sets to 0.0 any vertices that touch a tile which has not yet been rendered, termed a hole. This ensures spatial continuity of blending: wherever a composite tile falls within a hole, at the current LOD, drawing opacity should fade to zero at all vertices abutting that hole.
  • the algorithm can then relax all levelOpacityGrid values to further improve spatial continuity of LOD blending.
  • Every vertex is like a tentpole, where the levelOpacityGrid value at that point are the tentpole's height.
  • the algorithm has thus far ensured that at all points bordering on a hole, the tentpoles have zero height; and in the interior of tiles that have been rendered, the tentpoles are set to some (probably) nonzero value.
  • all the values inside a rendered tile are set to 1.0.
  • the border values are 0.0.
  • the relax operation smoothes out the tent, always preserving values of 0.0, but possibly lowering other tentpoles to make the function defined by the tent surface smoother, i.e. limiting its maximum spatial derivative. It is immaterial to the invention which of a variety of methods are used to implement this operation; one approach, for example, is to use selective low-pass filtering, locally replacing every nonzero value with a weighted average of its neighbors while leaving zeroes intact. Other methods will also be apparent to those skilled in the art.
  • the algorithm then walks over all composite grid vertices, considering corresponding values of levelOpacityGrid and opacityGrid at each vertex: if levelOpacityGrid is greater than 1.0-opacityGrid, then levelOpacityGrid is set to 1.0- opacityGrid. Then, again for each vertex, corresponding values of levelOpacityGrid are added to opacityGrid. Due to the previous step, this can never bring opacityGrid above 1.0. These steps in the algorithm ensure that as much opacity as possible is contributed by higher-resolution LODs when they are available, allowing lower-resolution LODs to "show through" only where there are holes.
  • levelOpacityGrid can be multiplied by a scalar overallOpacity variable in the range 0.0 to 1.0 just before drawing; this allows the entire image to be drawn with partial transparency given by the overallOpacity.
  • drawing an image-containing polygon, such as a rectangle, with different opacities at each vertex is a standard procedure. It can be accomplished, for example, using industry- standard texture mapping functions using the OpenGL or Direct3D graphics libraries.
  • the drawn opacity within the interior of each such polygon is spatially interpolated, resulting in a smooth change in opacity over the polygon.
  • tiles maintain not only their current values of centerOpacity, cornerOpacity and edgeOpacity (called the current values), but also a parallel set of values called targetCenterOpacity, targetComerOpacity and targetEdgeOpacity (called the target values).
  • the current values are all set to 0.0 when a tile is first rendered, but the the target values are all set to 1.0. Then, after each frame, the current values are adjusted to new values closer to the target values.
  • newNalue oldNalue*(l-b) + targetNalue*b, where b is a. rate in greater than 0.0 and less than 1.0.
  • a value of b close to 0.0 will result in a very slow transition toward the target value, and a value of b close to 1.0 will result in a very rapid transition toward the target value.
  • This method of updating opacities results in exponential convergence toward the target, and results in a visually pleasing impression of temporal continuity.
  • Other formulae can achieve the same result.
  • the present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for progressively rendering arbitrarily large or complex visual content in a zooming environment while maintaining good user responsiveness and high frame rates. Although it is necessary in some situations to temporarily degrade the quality of the rendition to meet these goals, the present invention largely masks this degradation by exploiting well-known properties of the human visual system.
  • GUIs graphical computer user interfaces
  • visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
  • the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • viewing maps browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • Even when viewing ordinary documents, such as spreadsheets and reports it is often useful to be able to glance at a document overview, then zoom in on an area of interest.
  • zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
  • these applications allow zooming in and out of documents, but not necessarily zooming in and out ofthe visual components ofthe applications themselves. Further, zooming is normally a peripheral aspect ofthe user's interaction with the software, and the zoom setting is only modified occasionally.
  • continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent.
  • any kind of visual content could be zoomed, and zooming would be as much a part ofthe user's experience as panning.
  • Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s 1 ; recent movies continue the trend 2 .
  • a number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present. 3 In 1991, some of these ideas were formalized in U.S. Patent 5,341 ,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability").
  • the prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have
  • the present invention embodies a novel idea on which a newly developed zooming user interface framework (hereafter referred to by its working name, Noss) is based.
  • Noss is more powerful, more responsive, more visually compelling and of more general utility than its predecessors due to a number of innovations in its software architecture.
  • This patent is specifically about Noss's approach to object tiling, level-of-detail blending, and render queueing.
  • a multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an image pyramid).
  • the present invention involves both strategies for prioritizing the (potentially slow) rendition ofthe parts ofthe image pyramid relevent to the current display, and stategies for presenting the user with a smooth, continuous perception ofthe rendered content based on partial information, i.e. only the currently available subset ofthe image pyramid.
  • these strategies make near-optimal use ofthe available computing power or bandwidth, while masking, to the extent possible, any image degradation resulting from incomplete image pyramids. Spatial and temporal blending are exploited to avoid discontinuities or sudden changes in image sharpness.
  • An objective ofthe present invention is to allow sampled (i.e. "pixellated") visual content to be rendered in a zooming user interface without degradation in ultimate image quality relative to conventional trilinear interpolation.
  • a further objective ofthe present invention is to allow arbitrarily large or complex visual content to be viewed in a zooming user interface.
  • a further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network.
  • a further objective ofthe present invention is to allow the user to zoom arbitrarily far in on visual content while ma taining interactive frame rates.
  • a further objective ofthe present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and mamtaining interactive frame rates.
  • a further objective ofthe present invention is to mimmize the user's perception of transitions between levels of detail or rendition qualities during interaction.
  • a further objective ofthe present invention is to allow the graceful degradation of image quality by continuous blurring when detailed visual content is as yet unavailable, either because the information needed to render it is unavailable, or because rendition is still in progress.
  • a further objective ofthe present invention is to gracefully increase image quality by gradual sharpening when renditions of certain parts ofthe visual content first become available.
  • zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome.
  • One such limitation is on the size of a document that can be "opened” from a computer application, as traditionally the entirety of such a document must be “loaded” before viewing or editing can begin.
  • RAM random access memory
  • this limitation is felt, because all ofthe document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open” command and being able to begin viewing or editing unacceptably long.
  • Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem.
  • Table 1 shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed.
  • the image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl .
  • a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl .
  • the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short.
  • LODs levels of detail
  • a low-resolution image serves as a "predictor" for the next higher resolution.
  • This allows the entire image hierarchy to be encoded very efficientl — more efficiently, in fact, than would usually be possible with a non- hierarchical representation ofthe high-resolution image alone. If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in the repository, then a natural consequence is that as the image is transferred across the data link to the cache, the user can obtain a low- resolution overview ofthe entire image very rapidly; finer and finer details will then "fill in” as the transmission progresses. This is known as incremental or progressive transmission.
  • an image browsing system can be made that is not only capable of viewing images of arbitrarily large size, but is also capable of navigating (i.e. zooming and panning) through such images efficiently at any level of detail.
  • Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order.
  • This model is random-access, meaning that only selected parts ofthe information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session.
  • the computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display.
  • each level of detail is the basic unit of transmission.
  • the size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges ofthe image, as its dimensions may not be an exact multiple ofthe nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile.
  • the resulting tiled image pyramid is shown in Figure 2. Note that the "tip" ofthe pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1.
  • the JPEG2000 image format includes all ofthe features just described for representing tiled, multiresolution and random-access images.
  • This includes (but is not limited to) large texts, maps or other vector graphics, spreadsheets, video, and mixed documents such as web pages.
  • Our discussion thus far has also implicitly considered a viewing-only application, i.e. one in which only the actions or methods corresponding to opening and drawing need be defined.
  • Clearly other methods may be desirable, such as the editing commands implemented by paint programs for static images, the editing commands implemented by word processors for texts, etc.
  • SUBSTITUTE SrHEET (RULE 26) is no longer possible if we have zoomed so far in that a single letter fills the entire screen. Hence a zooming user interface may also restrict the action of certain methods to their relevant levels of detail.
  • a visual document is not represented internally as an image, but as more abstract data —such as text, spreadsheet entries, or vector graphics — it is necessary to generalize the tiling concept introduced in the previous section.
  • the process of rendering a tile, once obtained, is trivial, since the information (once decompressed) is precisely the pixel-by-pixel contents ofthe tile.
  • the speed bottleneck is normally the transfer of compressed data to the computer (e.g. downloading).
  • the speed bottleneck is in the rendition of tiles; the information used to make the rendition may already be stored locally, or may be very compact, so that downloading no longer causes delay.
  • tile rendition the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant.
  • a complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on.
  • documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary ofthe parent document.
  • a node borrowing from programming terminology for trees.
  • drawing methods are defined for all nodes at all levels of detail, other methods corresponding to application-specific functionality may be defined only for certain nodes, and their action may be restricted only to certain levels of detail.
  • some nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode” — which can be navigated by zooming and panning.
  • zooming user interface There are a number of immediate consequences for a well-implemented zooming user interface, including: - - It is able to browse very large documents without downloading them in their entirety from the repository; thus even documents larger than the available short-term memory, or whose size would otherwise be prohibitive, can be viewed without limitation. - - Content is only downloaded as needed during navigation, resulting in optimally efficient use ofthe available bandwidth. - - Zooming and panning are spatially intuitive operations, allowing large amounts of information to be organized in an easily understood way. - - Since "screen space" is essentially unlimited, it is not necessary to minimize windows, use multiple desktops, or hide windows behind each other to work on multiple documents or views at once.
  • documents can be arranged as desired, and the user can zoom out for an overview of all of them, or in on particular ones. This does not preclude the possibility of rearranging the positions (or even scales) of such documents to allow any combination of them to be visible at a useful scale on the screen at the same time. Neither does it necessarily preclude combining zooming with more traditional approaches.
  • - - Because zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale.
  • - - High-resolution displays no longer imply shrinking text and images to small (sometimes illegible) sizes; depending on the level of zooming, they either allow more content to be viewed at once, or they allow content to be viewed at normal size and higher fidelity.
  • the client's first priority will be to fill in this "resolution hole”. If more than one level of detail is missing in the hole, then requests for all levels of detail with ⁇ 1, plus the next higher level of detail (to allow LOD blending — see #5), are queued in increasing order. At first glance, one might suppose that this introduces unnecessary overhead, because only the finest of these levels of detail is strictly required to render the current view; the coarser levels of detail are redundant, in that they define a lower-resolution image on the display. However, these coarser levels cover a larger area — in general, an area considerably larger than the display.
  • the coarsest level of detail for any node in fact includes only a single tile by construction, so a client rendering any view of a node will invariably queue this "outermost" tile first.
  • robustness we mean that the client is never "at a loss” regarding what to display in response to a user's parining and zooming, even if there is a large backlog of tile requests waiting to be filled.
  • the client simply displays the best (i.e. highest resolution) image available for every region on the display. At worst, this will be the outermost tile, which is the first tile ever requested in connection with the node.
  • tile requests are queued by increasing distance to the center ofthe screen, as shown in Figure 3.
  • This technology is inspired by the human eye, which has a central region — the fovea — specialized for high resolution. Because zooming is usually associated with interest in the central region ofthe display, foveated tile request queuing usually reflects the user's implicit prioritization for visual information during inward zooms. Furthermore, because the user's eye generally spends more time looking at regions near the center ofthe display than the edge, residual blurriness at the display edge is less noticeable than near the center. The transient, relative increase in sharpness near the center ofthe display produced by zooming in using foveal tile request order also mirrors the natural consequences of zooming out — see Figure 4.
  • the opacity ofthe new tile is a linear function of time since the tile became available, so that halfway through the fixed blend- in interval the new tile is 50% opaque), exponential, or follow any other interpolating function.
  • every small constant interval of time corresponds to a constant percent change in the opacity; for example, the new tile may become 20% more
  • FIG. 5 shows our simplest reference implementation for how each tile can be decomposed into rectangles and triangles, called tile shards, such that opacity changes continuously over each tile shard.
  • Tile X bounded by the square aceg, has neighboring tiles L, R, T and B on the left, right, top and bottom, each sharing an edge. It also has neighbors TL, TR, BL and BR sharing a single comer. Assume that tile X is present. Its “inner square", iiii, is then fully opaque.
  • Part (b) is a rectangle in which the opacities of two opposing edges are different; then the opacity over the interior is simply a linear interpolation based on the shortest distance of each interior point from the two edges.
  • Part (c) shows a bilinear method for interpolating opacity over a triangle, when the opacities of all three comers abc may be different.
  • every interior point/? subdivides the triangle into three sub-triangles as shown, with areas A, B and C.
  • the opacity at j? is then simply a weighted sum ofthe opacities at the corners, where the weights are the fractional areas ofthe three sub-triangles (i.e.
  • this strategy causes the relative level of detail visible to the user to be a continuous function, both over the display area and in time. Both spatial seams and temporal discontinuities are thereby avoided, presenting the user with a visual experience reminiscent of an optical instrument bringing a scene continuously into focus. For navigating large documents, the speed with which the scene comes into focus is a function ofthe bandwidth ofthe connection to the repository, or the speed of tile rendition, whichever is slower. Finally, in combination with the foveated prioritization of innovation #2, the continuous level of detail is biased in such a way that the central area ofthe display is brought into focus first. 5.
  • Generalized linear-mipmap-linear LOD blending Generalized linear-mipmap-linear LOD blending.
  • each tile shard has an opacity as drawn, which has been spatially averaged with neighboring tile shards at the same level of detail for spatial smoothness, and temporally averaged for smoothness over time.
  • the target opacity is 100% if the level of detail undersamples the display, i.e. / ⁇ 1 (see #1).
  • the target opacity is decreased linearly (or using any other monotonic function) such that it goes to zero if the oversampling is g-fold.
  • this causes continuous blending over a zoom operation, ensuring that the perceived level of detail never changes suddenly.
  • the number of blended levels of detail in this scheme can be one, two, or more. A number larger than two is transient, and caused by tiles at more than one level of detail not having been fully blended in temporally yet.
  • a single level is also usually transient, in that it normally occurs when a lower-than-ideal LOL> is "standing in” at 100% opacity for higher LODs which have yet to be downloaded or constructed and blended in.
  • the simplest reference implementation for rendering the set of tile shards for a node is to use the so-called “painter's algorithm": all tile shards are rendered in back-to- front order, that is, from coarsest (lowest LOD) to finest (highest LOD which oversamples the display less than g-fold).
  • the target opacities of all but the highest LOD are 100%, though they may transiently be rendered at lower opacity if their temporal blending is incomplete.
  • the highest LOD has variable opacity, depending on how much it oversamples the display, as discussed above.
  • this reference implementation is not optimal, in that it may render shards which are then fully obscured by subsequently rendered shards. More optimal implementations are possible through the use of data structures and algorithms analogous to those used for hidden surface removal in 3D graphics. 6. Motion anticipation. During rapid zooming or panning, it is especially difficult for tile requests to keep up with demand. Yet during these rapid navigation patterns, the zooming or panning motion tends to be locally well-predicted by linear extrapolation (i.e. it is difficult to make sudden reversals or changes in direction).
  • the present invention relates generally to multiresolution imagery. More specifically, the invention is a system and method for efficiently blending together visual representations of content at different resolutions or levels of detail in real time. The method ensures perceptual continuity even in highly dynamic contexts, in which the data being visualized may be changing, and only partial data may be available at any given time.
  • the invention has applications in a number of fields, including (but not limited to) zooming user interfaces (ZUIs) for computers.
  • ZUIs zooming user interfaces
  • the invention applies in situations in which visual data can be obtained "on the fly” at different levels of detail, for example, from a camera with machine-controllable pan and zoom.
  • the present invention is a general approach to the dynamic display of such multiresolution visual data on one or more 2D displays (such as CRTs or LCD screens).
  • 2D displays such as CRTs or LCD screens.
  • the wavelet decomposition of a large digital image e.g. as used in the JPEG2000 image format.
  • This decomposition takes as its starting point the original pixel data, normally an array of samples on a regular rectangular grid. Each sample usually represents a color or luminance measured at a point in space corresponding to its grid coordinates. In some applications the grid may be very large, e.g.
  • the image is first resized to a
  • 18 image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and
  • 21 granularity may change at different scales, but here, for example and without limitation,
  • each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmissicn.
  • the size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges ofthe image, as its dimensions may not be an exact multiple ofthe nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile.
  • the 512x512 pixel image considered earlier has 8x8 tiles at its highest level of detail, 4x4 at the 256x256 level, 2x2 at the 128x128 level, and a single tile at the remaining levels of detail.
  • the JPEG2000 image format includes the features just described for representing tiled, multiresolution and random-access images. If a detail of a large, tiled JPEG2000 image is being viewed interactively by a client on a 2D display of limited size and resolution, then some particular set of adjacent tiles, at a certain level of detail, are needed to produce an accurate rendition. In a dynamic context, however, these may not all be available.
  • Tiles at coarser levels of detail often will be available, however, particularly if the user began with a broad overview of the image. Since tiles at coarser levels of detail span a much wider area spatially, it is likely that the entire area of interest is covered by some combination of available tiles. This implies that the image resolution available will not be constant over the display area.
  • the edge regions of tiles reserved for blending are referred to as blending flaps.
  • the simple reference implementation for displaying a finished composite image is a "painter's algorithm": all relevant tiles (that is, tiles overlapping the display area) in the coarsest level of detail are drawn first, followed by all relevant tiles in progressively finer levels of detail. At each level of detail blending was applied at the edges of incomplete areas as described. The result, as desired, is that coarser levels of detail "show through” only in places where they are not obscured by finer levels of detail.
  • this simple algorithm works, it has several drawbacks: first, it is wasteful of processor time, as tiles are drawn even when they will ultimately be partly or even completely obscured.
  • the painter's algorithm relies precisely on the effect of one "layer of paint” (i.e. level of detail) fully obscuring the one underneath; it is not known in advance where a level of detail will be obscured, and where not.
  • the Invention resolves these issues, while preserving all the advantages ofthe painter's algorithm.
  • One of these advantages is the ability to deal with any kind of LOD tiling, including non-rectangular or irregular tilings, as well as irrational grid tilings, for which I am filing a separate provisional patent application.
  • Tilings generally consist of a subdivision, or tesselation, ofthe area containing the visual content into polygons.
  • the areas of tiles at lower levels of detail be larger than the areas of tiles at higher levels of detail; the multiplicative factor by which their sizes differ is the granularity g, which we will assume (but without limitation) to be a constant.
  • g which we will assume (but without limitation) to be a constant.
  • an irrational but rectangular tiling grid will be used to describe the improved algorithm. Generalizations to other tiling schemes should be evident to anyone skilled in the art.
  • the improved algorithm consists of four stages. In the first stage, a composite grid is constructed in the image's reference frame from the superposition ofthe visible parts of all ofthe tile grids in all ofthe levels of detail to be drawn.
  • SUBSTITUTE ⁇ EET (RULE 26) there be n grid lines parallel to the .x-axis and m grid lines parallel to they-axis.
  • n * m table With entries corresponding to the squares ofthe grid.
  • Each grid entry has two fields: an opacity, which is initialized to zero, and a list of references to specific tiles, which is initially empty.
  • the second stage is to walk through the tiles, sorted by decreasing level of detail (opposite to the na ⁇ ve implementation).
  • Each tile covers an integral number of composite grid squares. For each of these squares, we check to see if its table entry has an opacity less than 100%, and if so, we add the current tile to its list and increase the opacity accordingly.
  • the per-tile opacity used in this step is stored in the tile data structure.
  • the composite grid will contain entries corresponding to the correct pieces of tiles to draw in each grid square, along with the opacities with which to draw these "tile shards". Normally these opacities will sum to one. Low-resolution tiles which are entirely obscured will not be referenced anywhere in this table, while partly obscured tiles will be referenced only in tile shards where they are partly visible.
  • the third stage ofthe algorithm is a traversal ofthe composite grid in which tile shard opacities at the composite grid vertices are adjusted by averaging with neighboring vertices at the same level of detail, followed by readjustment ofthe vertex opacities to preserve the summed opacity at each vertex (normally 100%).
  • This implements a refined version ofthe spatial smoothing of scale described in a separate provisional patent application. The refinement comes from the fact that the composite grid is in general denser than the 3x3 grid per tile defined in innovation #4, especially for low-resolution tiles.
  • the composite gridding will be at least as fine as necessary.
  • This allows the averaging technique to achieve greater smoothness in apparent level of detail, in effect by creating smoother blending flaps consisting of a larger number of tile shards.
  • the composite grid is again traversed, and the tile shards are actually drawn.
  • this algorithm involves multiple passes over the data and a certain amount of bookkeeping, it results in far better performance than the naive algorithm, because much less drawing must take place in the end; every tile shard rendered is visible to the user, though sometimes at low opacity. Some tiles may not be drawn at all.
  • na ⁇ ve algorithm which draws every tile intersecting with the displayed area in its entirety.
  • An additional advantage of this algorithm is that it allows partially transparent nodes to be drawn, simply by changing the total opacity target from 100% to some lower value. This is not possible with the na ⁇ ve algorithm, because every level of detail except the most detailed must be drawn at full opacity in order to completely "paint over" any underlying, still lower resolution tiles.
  • the composite grid can be constructed in the usual manner; it may be larger than the grid would have been for the unrotated case, as larger coordinate ranges are visible along a diagonal.
  • Another exemplary optimization is that the total opacity rendering left to do, expressed in terms of (area) x (remaining opacity), can be kept track of, so that the algorithm can quit early if everything has already been drawn; then low levels of detail need not be "visited” at all if they are not needed.
  • the algorithm can be generalized to arbitrary polygonal tiling patterns by using a constrained Delaunay triangulation instead of a grid to store vertex opacities and tile shard identifiers.
  • This data structure efficiently creates a triangulation whose edges contain every edge in all ofthe original LOD grids; accessing a particular triangle or vertex is an efficient operation, which can take place in of order n*log( ⁇ ) time (where n is the number of vertices or triangles added).
  • n is the number of vertices or triangles added.
  • the resulting triangles are moreover the basic primitive used for graphics rendering on most graphics platforms.
  • FIGURE 1 A first figure.
  • the present invention is directed to methods and apparatus for the application of image navigation techniques in advancing commerce, for example, by way of providing new environments for advertising and purchasing products and/or services.
  • mapping and geospatial applications are a booming industry. They have been attracting rapidly increasing investment from businesses in many different markets — from candidates like Federal Express, clothing stores and fast food chains. In the past several years, mapping has also become one ofthe very few software applications on the web that generate significant interest (so-called "killer apps"), alongside search engines, web-based email, and matchmaking.
  • mapping should in principle be highly visual, at the moment its utility for end users lies almost entirely in generating driving directions.
  • the map images which invariably accompany the driving directions are usually poorly rendered, convey little information, and cannot be navigated conveniently, making them little more than window dressing. Clicking on a pan or zoom control causes a long delay, during which the web browser becomes unresponsive, followed by the appearance of a new map image bearing little visual relationship to the previous image.
  • computers should be able to navigate digital maps more effectively than we navigate paper atlases, in practice visual navigation of maps by computer is still inferior.
  • the present invention is intended to be employed in combination with a novel technology permitting continuous and rapid visual navigation of a map (or any other image), even over a low bandwidth connection.
  • This technology relates to new techniques for rendering maps continuously in a panning and zooming environment. It is an application of fractal geometry to line and point rendering, allowing networks of roads (ID curves) and dots marking locations (OD points) to be drawn at all scales, producing the illusion of continuous physical zooming, while still keeping the "visual density" ofthe map bounded.
  • Related techniques apply to text labels and iconic content. This new approach to rendition avoids such effects as the sudden appearance or disappearance of small roads during a zoom, an adverse effect typical of digital map drawing. The details of this navigation technology may be found in U.S.
  • GIS geographical information service
  • the capabilities ofthe new navigation techniques ofthe present invention are described in detail in the aforementioned U.S. patent application.
  • the most relevant aspects ofthe base technology are: - smooth zooming and panning through a 2D world with perceptual continuity and advanced bandwidth management; - an infinite-precision coordinate system, allowing visual content to be nested without limit; - the ability to nest content stored on many different servers, so that spatial containment is equivalent to a hyperlink.
  • a map consists of many layers of information; ultimately, the Voss map application will allow the user to turn most of these layers on and off, making the map highly customizable.
  • Layers include: 1. roads; 2. waterways; 3. administrative boundaries; 4. aerial photography-based orthoimagery (aerial photography which has been digitally "unwarped” such that it tiles a map perfectly); 5. topography; 6. public infrastructure locations, e.g. schools, churches, public telephones, restrooms;
  • the most salient layers from the typical user's point of view are 1-4 and 7.
  • the advertising/user content layers 10-11 which are of particular interest in this patent application are also of significant interest.
  • Many ofthe map layers — including 1-7 — are already available, at high quality and negligible cost, from the U.S. Federal Government. Value-added layers like 8-9 (and others) can be made available at any time during development or even after deployment.
  • GIS geographic information service
  • national Yellow Pages/White Pages data may also be valuable in implementing the present invention. This information may also be licensed. National Yellow Pages/White Pages data may be used in combination with geocoding to allow geographical user searches for businesses, or filtering (e.g. "highlight all restaurants in Manhattan"). Perhaps most importantly, directory listings combined with geocoding will greatly simplify associating business and personal users with geographic locations, allowing "real estate" to be rented or assigned via an online transaction and avoiding the need for a large sales force.
  • the "neartime” data are updated at least every 90 days. Combined with 90-day caching of entries already obtained on our end, this is a very economical way to obtain high-quality national listings.
  • "Realtime” data, updated nightly, are also available, but are more expensive ($0.20/hit).
  • the realtime data are identical to those used by 411 operators.
  • the Voss mapping application requires both downloadable client software and generates revenue through advertising, it will not suffer the disadvantages of classic advertising-based business models. Even before any substantial commercial space has been "rented", the present invention will provide a useful and visually compelling way of viewing maps and searching for addresses — that is, similar functionality to that of existing mapping applications, but with a greatly improved visual interface. Furthermore, the approach ofthe present invention provides limited but valuable service to non-commercial users free of charge to attract a user base. The limited service consists of hosting a small amount (5-15 MB) of server space per user, at the user's geographical location — typically a house.
  • the client software may include simple authoring capabilities, allowing users to drag and drop images and text into their "physical address", which can then be viewed by any other authorized user with the client software. (Password protection may be available.) Because the zooming user interface approach is of obvious benefit for navigating digital photo collections — especially over limited bandwidth — the photo album sharing potential alone may attract substantial numbers of users. Additional server space may be available for a modest yearly fee. This very horizontal market is likely to be a major source of revenue.
  • the sources of revenue may include: 1. Commercial "rental” of space on the map corresponding to a physical address; 2. Fees for "plus services” (defined below) geared toward commercial users; 3. Fees for "plus services” geared toward non-commercial users; 4. Professional zoomable content authoring software; 5. Licensing or partnerships with PDA, cell phone, car navigation system, etc. vendors and service providers; 6. Information.
  • Basic commercial rental of space on a map can be priced using a combination ofthe following variables: 1. Number of sites on the map; 2. Map area ("footprint") per site, in square meters;
  • Focusing priority allows commercial content to come into focus faster than it would otherwise, increasing its prorninence in the user's "peripheral vision". This feature will be tuned to deliver commercial value without compromising the user's navigation experience.
  • 3. Including a conventional web hyperlink in the zoomable content these may be clearly marked (e.g., with the conventional underlined blue text) and, on the user's click, open a web browser. We can either charge for including such a hyperlink, or, like Google, charge per click.
  • Making the geographic area rented refer to an outside commercial server, which will itself host zoomable content of any type and size — this is a fancier version of #3, and allows any kind of e-business to be conducted via the map.
  • Billboards as in real life, many high- visibility areas of the map will have substantial empty space. Companies can buy this space and insert content, including hyperlinks and "hyperjumps", which if clicked will make the user jump through space to a commercial site elsewhere on the map. In contrast to ordinary commercial space, billboard space need not be rented at a fixed location; its location can be generated on the fly during user navigation.
  • zoomable Voss content will be possible from within the free client. This will include inserting text, dragging and dropping digital photos, and setting
  • Professional authoring software may be a modified version ofthe client designed to allow more flexible zoomable content creation, as well as facilities for making hyperlinks and hyperjumps, and inserting custom applets.
  • Use ofthe present invention may generate a great deal of aggregate and individual information on spatial attention density, navigation routes and other patterns. These data are of commercial value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Ecology (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Methods and apparatus are contemplated to perform various actions, including: zooming into or out of an image having at least one object, wherein at least some elements of the at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming, and wherein, for example, the non-physically proportional scaling may be expressed by the following formula: p = d' • za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units, z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1.

Description

METHQES: AND APPARATUS FOR NAVIGATING AN IMAGE
BACKGROUND ART The present invention relates to methods and apparatus for navigating, such as zooming and panning, over an image of an object in such a way as to provide the appearance of smooth, continuous navigational movement. Most conventional graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale, it has long been recognized, however, that visual components may be represented and manipulated such that they do not have a fixed spatial scale on the display; indeed, the visual components may be panned and/or zoomed in or out. The ability to zoom in and out on an image is desirable in connection with, for example, viewing maps, browsing through text layouts such as newspapers, viewing digital photographs, viewing blueprints or diagrams, and viewing other large data sets. Many existing computer applications, such as Microsoft Word, Adobe Photo Shop, Adobe Acrobat, etc., include zoomable components. In general, the zooming capability provided by these computer applications is a peripheral aspect of a user's interaction with the software and the zooming feature is only employed occasionally. These computer applications permit a user to pan over an image smoothly and continuously (e.g., utilizing scroll bars or the cursor to translate the viewed image left, right, up or down) . A significant problem with such computer applications, however, is that they do not permit a user to zoom smoothly and continuously. Indeed, they provide zooming in discrete steps, such as 10%, 25%, 50%, 75%, 100%, 150%, 200%, 500%, etc. The user selects the desired zoom using the cursor and, in response, the image changes abruptly to the selected zoom level. The undesirable qualities of discontinuous zooming also exist in Internet-based computer applications. The computer application underlying the www.mapguest . com website illustrates this point. The MapQuest website permits a user to enter one or more addresses and receive an image of a roadmap £ιv response. FIGS. 1-4 are examples of images that one may obtain from the MapQuest website in response to a query for a regional map of Long Island, NY, U.S.A. The MapQuest website permits the user to zoom in and zoom out to discrete levels, such as 10 levels. FIG. 1 is a rendition at zoom level 5, which is approximately 100 meters/pixel. FIG. 2 is an image at a zoom level 6, which is about 35 meters/pixel. FIG. 3 is an image at a zoom level 7, which is about 20 meters/pixel. FIG. 4 is an image at a zoom level 9, which is about 10 meters/pixel. As can be seen by comparing FIGs . 1-4, the abrupt transitions between zoom levels result in a sudden and abrupt loss of detail when zooming out and a sudden and abrupt addition of detail when zooming in. For example, no local, secondary or connecting roads may be seen in FIG. 1 (at zoom level 5) , although secondary and connecting roads suddenly appear in FIG. 2, which is the very next zoom level. Such abrupt discontinuities are very displeasing when utilizing the MapQuest website. It is noted, however, that even if the MapQuest software application were modified to permit a view of, for example, local streets at zoom level 5 (FIG. 1) , the results would still be unsatisfactory. Although the visual density of the map would change with the zoom level such that at some level of zoom, the result might be pleasing (e.g., at level 7, FIG. 3) , as one zoomed in the roads would not thicken, ma.king the map look overly sparse. As one zoomed out, the roads would eventually run into each other, rapidly forming a solid nest in which individual roads would be indistinguishable. The ability to provide smooth, continuous zooming on images of road maps is problematic because of the varying levels of coarseness associated with the road categories. In the United States, there are about five categories of roads (as categorized under the Tiger/Line Data distributed by the U.S. Census Bureau): Al, primary highways; A2, primary roads; A3, state highways, secondary roads, and connecting roads; A4, local streets, city streets and rural roads; and A5, dirt roads. These roads may be considered the elements of an overall object (i.e., a roadmap). The coarseness of the road elements manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably imre A2 roads than Al roads. In addition, the physical dimensions of the roads (e.g., their widths), vary significartly. Al roads may be about 16 meters wide, A2 roads may be aboαt 12 meters wide, A3 roads may be about 8 meters wide, A4 roads may be about 5 meters wide, and A5 roads may be about 2.5 meters wide. The MapQuest computer application deals with these varying levels of coarseness by displaying only the road categories deemed appropriate at a particular zoom level. For example, a nation-wide view might only show Al roads, while a state-wide view might show Al and A2 roads, and a county-wide view might show Al, A2 and A3 roads . Even if MapQuest were modified to allow continuous zooming of the roadmap, this approach would lead to the sudden appearance and disappearance of road categories during zooming, which is confusing and visually displeasing. In view of the foregoing, there are needs in the art for new methods and apparatus for navigating images of complex objects, which permit smooth and continuous zooming of the image while also preserving visual distinctions between the elements of the objects based on their size or importance.
DISCLOSURE OF THE INVENTION In accordance with one or more aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: zooming into or out of an image having at least one object, wherein at least some elements of at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming. The non-physically proportional scaling may be expressed by the following formula: p = c • d • za, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a linear size in physical units of the one or more elements of the object, z is the zoom level in units of physical linear size/pixel, and a is a scale power where a ≠ -1. \ Under non-physical scaling,^ the scale power a \is not equal to -1 (typically -1 < a < 0) within a range of zoom levels -..0 and zl, where zO is of a lower physical linear size/pixel than zl . Preferably, at least one of zO and zl may vary for one or more elements of the object. It is noted that a, c and d may also vary from element to element . At least some elements of the at least one obj ect may also be scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming. The physically proportional scaling may t>e expressed by the following formula : p = c • d/z, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a linear size of the one or morre elements of the obj ect in physical units, and z is the zoom level in units of physical linear size/pixel . It is noted that the methods and apparatus described thus far and/or described later in this document may be achieved utilizing any of the known technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, programmable digital devices or systems, programmable array logic devices, or any combination of the above . The invention may also be embodied in a software program for storage in a suitable storage medium and execution by a processing unit . The elements of the obj ect may be of varying degrees of coarseness . For example, as discuss ed above, the coarseness of the elements of a roadmap object manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably more A2 roads than Al roads. Degree of coarseness in road categories also manifests in such properties as average road length, frequency of intersections , and maximum curvature . The coarseness of the elements of other image objects may manifest in other ways too numerous to list in their entirety. Thus, the scaling of the elements in a given predetermined, image may be physically proportional or non-physically proportional based on at least one of : (i) a degree of coarseness of such elements ; and (ii) the zoom level of the given predetermined image. For example, the object may be a roadmap, the elements of the object may be roads, and the varying degrees of coarseness may be road hierarchies. Thus; the scaling of a given road in a given predetermined image may be physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level of the given predetermined image. In accordance with one or more further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of a roadmap; receiving one or more user navigation commands including zooming information at the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation. In accordance with one or more still further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of at least one object, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving one or more user navigation commands including zooming information at the client terminal; blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal. In accordance with one or more still further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of a roadmap to a client tfc-.„;J.nal o^esc a rommunications channel; receiving the plurality of pre—rendered images at thc;' client terminal; issuing one or more user navigation commands incluαing zooming information using the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation. In accordance with one or more still further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of at least one object to a client terminal over a communications channel, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pire-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving the plurality of pre-rendered images at the client terminal; issuing one or more user navigation commands including zooming information using the client terminal; blending two of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal . Other aspects, features, and advantages will become apparent to one of ordinary skill in the art when the description herein is taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS For the purposes of illustrating the invention, forms are shown in the drawing, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown. FIG. 1 is an image taken from the MapQuest website, which is at a zoom level 5; FIG. 2 is an image taken from the M'ap.Qe.es website, which is at a zoom level 6; FIG. 3 is an image taken from the MapQuest website, which is at a zoom level 7; FIG. 4 is an image taken from the MapQuest website, which is at a zoom level 9; FIG. 5 is an image of Long Island produced at a zoom level of about 334 meters/pixel in accordance with one or more aspects of the present invention; FIG. 6 is an image of Long Island produced at a zoom level of about 191 meters/pixel in accordance with one or more further aspects of the present invention; FIG. 7 is an image of Long Island produced at a zoom level of about 109.2 meters/pixel in accordance with one or more further aspects of the present invention; FIG. 8 is an image of Long Island produced at a zoom level of about 62.4 meters/pixel in accordance with one or more further aspects of the present invention; FIG. 9 is an image of Long Island produced at a zoom level of about 35.7 meters/pixel in accordance with one or more further aspects of the present invention; FIG. 10 is an image of Long Island produced at a zoom level of about 20.4 meters/pixel in accordance with one or more further aspects of the present invention; FIG. 11 is an image of Long Island produced at a zoom level of about 11.7 meters/pixel in accordance with one or more further aspects of the present invention; FIG. 12 is a flow diagram illustrating process steps that may be carried out in order to provide smooth and continuous navigation of an image in accordance with one or more aspects of the present invention; FIG. 13 is a flow diagram illustrating further process steps that may be carried out in order to smoothly navigate an image in accordance with various aspects of the present invention; FIG. 14 is a log-log graph of a line width in pixels versus a zoom level in meters/pixel illustrating physical and non-physical scalinef in accordance with one or. rβaπ≥r further aspects of the present inveafe sa; arrcf FIG. 15 is a log-log graph illustrating variations in the physical and non-physical scaling of FIG. 14. FIGS. 16A-D illustrate respective antialiased vertical lines whose endpoints are precisely centered on pixel coordinates; FIGS. 17A-C illustrate respective antialiased lines on a slant, with endpoints not positioned to fall at exact pixel coordinates; and FIG. 18 is the log-log graph of line width versus zoom level of FIG. 14 including horizontal lines indicating incremental line widths, and vertical lines spaced such that the line width over the interval between two adjacent vertical lines changes by no more than two pixels.
BEST MODE OF CARRYING OUT THE INVENTION Referring now to the drawings, wherein like numerals indicate like elements, there is shown in FIGS. 5-11 a series of images representing the road system of Long Island, NY, U.S.A. where each image is at a different zoom level (or resolution) . Before delving into the technical details of how the present invention is implemented, these images will now be discussed in connection with desirable resultant features of using the invention, namely, at least the appearance of smooth and continuous navigation, particularly zooming, while maintaining informational integrity. It is noted that the various aspects of trie present invention that will be discussed below may be applied in contexts other than the navigation of a roadmap image. Indeed, the extent of images and implementations for which the present invention may be employed are too numerous to list in their entirety. For example, the features of the present invention may be used to navigate images of the human anatomy, complex topographies, engineering diagrams such as wiring diagrams or blueprints, gene ontologies, etc. It has been found, however, that the invention has particular applicability to navigating images in which the elements thereof are of varying levels of detail or coarseness. Therefore, for the purposes of brevity and clarity, the various aspects of -Te present in.vrer.t .Qn: will . be discussed in connection with a specific exam le, namely, images !)f a roadmap. Although i.;_" is impossible to demonstrate the appearance of smooth and continuous zooming in a patent document, this feature has been demonstrated through experimentation and prototype development by executing a suitable software program on a Pentium-based computer. The image 100A of the roadmap illustrated in FIG. 5 is at a zoom level that may be characterized by units of physical length/pixel (or physical linear size/pixel) . In other words, the zoom level, z, represents the actual physical linear size that a single pixel of the image 100A represents. In FIG. 5, the zoom level is about 334 meters/pixel. Those skilled in the art will appreciate that the zoom level may be expressed in other units without departing from the spirit and scope of the claimed invention. FIG. 6 is an image 100B of the same roadmap as FIG. 5, although the zoom level, z, is about 191 meters/pixel. In accordance with one or more aspects of the present invention, a user of the software program embodying one or more aspects of the invention may zoom in or out between the levels illustrated in FIGS. 5 and 6. It is significant to note that such zooming has the appearance of smooth and continuous transitions from the 334 meters/pixel level (FIG. 5) to/from the 191 meters/pixel level (FIG. 6) and any levels therebetween. Likewise, the user may zoom to other levels, such as z = 109.2 meters/pixel (FIG. 7), z = 62.4 meters/pixel (FIG. 8), z = 35.7 meters/pixel (FIG. 9), z = 20.4 meters/pixel (FIG. 10), and z = 11.7 meters/pixel (FIG. 11). Again, the transitions through these zoom levels and any levels therebetween advantageously have the appearance of smooth and continuous movements . Another significant feature of the present invention as illustrated in FIGS. 5-11 is that little or no detail abruptly appears or disappears when zooming from one level to another level. The detail shown in FIG. 8 (at the zoom level of z = 62.4 meters/pixel) may also be found in FIG. 5 (at a zoom level of z = 334 meters/pixel) . This is so even though the image object, in this case the roadmap, includes elements (i.e., roads) of varying degrees of coarseness. Indeed, the roadmap 100D of FIG. 8 5 includes at least Al highways such as 102, A3 -secondary roads such as 104, and A4 local roads such as 106. Yet these details, even the A4 local roads 106, may still be seen in image 100A of FIG. 5, which is substantially zoomed out in comparison with the image 100D of FIG. 8.
10 Still further, despite that the A4 local roads 106 may be seen at the zoom level of z = 334 meters/pixel (FIG. 5) the Al, A2, A3, and A4 roads may be distinguished from one another. Even differences between Al primary highways 102 and A2 primary roads 108 may be distinguished from one another vis-a-vis the relative
15 weight given to such roads in the rendered image 100A. The ability to distinguish among the road hierarchies is also advantageously maintained when the user continues to zoom in, for example, to the zoom level of z = 20.4 meters/pixel as illustrated in image 100F of FIG. 10. Although the weight of the Al primary
.0 highway 102 significantly increases as compared with the zoom level of z = 62.4 meters/pixel in FIG. 8, it does not increase to such an extent as to obliterate other detail, such as the A4 local roads 106 or even the A5 dirt roads. Nevertheless, the weights of the roads at lower hierarchical levels, such as A4 local roads 106
.5 significantly increase in weight as compared with their counterparts at the zoom level z = 62.4 meters/pixel in FIG. 8. Thus, even though the dynamic range of zoom levels between that illustrated in FIG. 5 and that illustrated in FIG. 11 is substantial and detail remains substantially consistent (i.e., no
»0 roads suddenly appear or disappear while smoothly zooming) , the information that the user seeks to obtain at a given zooming level is not obscured by undesirable artifacts. For example, at the zoom level of z = 334 meters/pixel (FIG. 5), the user may wish to gain a general sense of what primary highways exist and in what
5 directions they extend. This information may readily be obtained even though the A4 local roads 106 are also depicted. At the zoom level of z = 62.4 meters/pixel (FIG. 8), the user may wish to determine whether a particular Al primary highway 102 or A2 primary road 108 services a particular city or neighborhood.
0 Again, the user may obtain this information without interference from other much more detailed information, such as the existence and extent of A local roads 106 or even A5 dirt, roads. Finally, at the zoom level of z = 11.7 meters/pixel, a ΛSΘ Jtcay be interested in finding a particular A4 local road such as- 112, and may do so without interference by significantly larger roads such as the Al primary highway 102. In order to achieve one or more of the various aspects of the present invention discussed above, it is contemplated that one or more computing devices execute one or more software programs that cause the computing devices to carry out appropriate actions. In this regard, reference is now made to FIGS. 12-13, which are flow diagrams illustrating process steps that are preferably carried out by the one or more computing devices and/or related equipment. While it is preferred that the process flow is carried out by commercially available computing equipment (such as Pentium-based computers) , any of a number of other techniques may be employed to carry out the process steps without departing from the spirit and scope of the present invention as claimed. Indeed, the hardware employed may be implemented utilizing any other known or hereinafter developed technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, one or more programmable digital devices or systems, such as programmable read only memories (PROMs) , programmable array logic devices (PALs) , any combination of the above, etc. Further, the methods of the present invention may be embodied in a software program that may be stored on any of the known or hereinafter developed media. FIG. 12 illustrates an embodiment of the invention in which a plurality of images are prepared (each at a different zoom level or resolution) , action 200, and two or more of the images are blended together to achieve the appearance of smooth navigation, such as zooming (action 206) . Although not required to practice the invention, it is contemplated that the approach illustrated in FIG. 12 be employed in connection with a service provider - client relationship. For example, a service provider would expend the resources to prepare a plurality of pre-rendered images (action 200) and make the images available to a user's client terminal over a communications channel, such as the Internet (action 202) . Alternatively', the pre—rei-Edrred images may be -an integral or related part of an a-jsp ication program that the user loads and executes on his ω_ ~ her computer. It has been found through experimentation that, when the blending approach is used, a set of images at the following zoom levels work well when the image object is a roadmap; 30 meters/pixel, 50 meters/pixel, 75 meters/pixel, 100 meters/pixel, 200 meters/pixel, 300 meters/pixel, 500 meters/pixel, 1000 meters/pixel, and 3000 meters/pixel. It is noted, however, that any number of images may be employed at any number of resolutions without departing from the scope of the invention. Indeed, other image objects in other contexts may be best served by a larger or smaller number of images, where the specific zoom levels are different from the example above. Irrespective of how the images are obtained by the client terminal, in response to user-initiated navigation commands (action 204), such as zooming commands, the client terminal is preferably operable to blend two or more images in order to produce an intermediate resolution image that coincides with the navigation command (action 206) . This blending may be accomplished by a number of methods, such as the well—known trilinear interpolation technique described by Lance Williams, Pyramidal Parametrics, Computer Graphics, Proc. SIGGRAPH Λ83, 17(3): 1-11 (1983), the entire disclosure of which is incorporated herein by reference. Other approaches to image interpolation are also useful in connection with the present invention, such as bicubic-linear interpolation, and still others may be developed in the future. It is noted that the present invention does not require or depend on any particular one of these blending methods. For example, as shown in FIG. 8, the user may wish to navigate to a zoom level of 62.4 meters/pixel. As this zoom level may be between two of the pre-rendered images (e.g., in this example between zoom level 50 meters/pixel and zoom level 75 meters/pixel), the desired zoom level of 62.4 meters/pixel may be achieved using the trilinear interpolation technique. Further, any zoom level between 50 meters/pixel and 75 meters/pixel may be obtained utilizing a blending method as described above, which if per r-s ed quickly enough provides the/ ,,insurance of smot th and continuoiug m d„qa.tΛ<&xi- . '8fe®-. I ns& sgr technique may be carried through to other zoom levels, such as the 35.7 meters/pixel level illustrated in FIG. 9. In such case, the blending technique may be performed as between the pre-rendered images of 30 meters/pixel and 50 meters/pixel of the example discussed thus far. The above blending approach may be used when the computing power of the processing unit on which the invention is carried out is not high enough to (i) perform the rendering operation in the first instance, and/or (ii) perform image rendering "just-in-time" or "on the fly" (for example, in real time) to achieve a higti image frame rate for smooth navigation. As will be discussed below, however, other embodiments of the invention contemplate use of known, or hereinafter developed, high power processing units that are capable of rendering at the client terminal for blending and/or high frame rate applications. The process flow of FIG. 13 illustrates the detailed steps and/or actions that are preferably conducted to prepare one or more images in accordance with the present invention. At action 220, the information is obtained regarding the image object or objects using any of the known or hereinafter developed techniques. Usually, such image objects have been modeled using appropriate primitives, such as polygons, lines, points, etc. For example, when the image objects are roadmaps, models of the roads in any Universal Transverse Mercator (UTM) zone may readily be obtained. The model is usually in the form of a list of line segments (in any coordinate system) that comprise the roads in the zone. The list may be converted into an image in the spatial domain (a pixel image) using any of the known or hereinafter developed rendering processes so long as it incorporates certain techniques for determining the weight (e.g., apparent or real thickness) of a given primitive in the pixel (spatial) domain. In keeping with the roadmap example above, the rendering processes should incorporate certain techniques for determining the weight of the lines that model the roads of the roadmap in the spatial domain. These techniques will be discussed below. At action λ ' λ (,F.IG. 13), the elements of 'he object are classified. In the case of a roadmap object, the classifica.fci-on may take the form of recognizing already existing categories, namely, Al, A2, A3, A4, and A5. Indeed, these road elements have varying degrees of coarseness and, as will be discussed below, may be rendered differently based on this classification. At action 224, mathematical scaling is applied to the different road elements based on the zoom level. As will be discussed in more detail below, the mathematical scaling may also vary based on the element classification. By way of background, there are two conventional techniques for rendering image elements such as the roads of a map: actual physical scaling, and pre-set pixel width. The actual physical scaling technique dictates that the roadmap is rendered as if viewing an actual physical image of the roads at different scales. Al highways, for example, might be 16 meters wide, A2 roads might be 12 meters wide, A3 roads might be 8 meters wide, A4 roads might be 5 meters wide, and A5 roads might be 2.5 meters wide. Although this might be acceptable to the viewer when zoomed in on a small area of the map, as one zooms out, all roads, both major and minor, become too thin to make out. At some zoom level, say at the state level (e.g., about 200 meters/pixel), no roads would be seen at all. The pre-set pixel width approach dictates that every road is a certain pixel width, such as one pixel in width on the display. Major roads, such as highways, may be emphasized by making them two pixels wide, etc. Unfortunately this approach makes the visual density of the map change as one zooms in and out. At some level of zoom, the result might be pleasing, e.g., at a small-size county level. As one zooms in, however, roads would not thicken, making the map look overly sparse. Further, as one zooms out, roads would run into each other, rapidly forming a solid nest in which individual roads would be indistinguishable. In accordance with one or more aspects of the present invention, at action 224, the images are produced in such a way that at least some image elements are scaled up and/or down either (i) physically proportional to the zoom level; or (ii) i non-physically J-_ύρortional« to hat zoo level, depending on parameters that will be c-xscussed in more detail below. It is noted th 'ι. - the scaling being "physically proportional to the zoom level" means that the number of pixels representing the road width increases or decreases with the zoom level as the 10 size of an element would appear to change with its distance from the human eye. The perspective formula, giving the apparent length y of an object of physical size d, is: y = c • d/x, where c is a constant determining the angular perspective and 15 x is the distance of the object from the viewer. In the present invention, the linear size of an object of physical linear size d' in display pixels p is given by p = d' z\ where z is the zoom level in units of physical linear
10 size/pixel (e.g. meters/pixel), and a is a power law. When a = -1 and d' = d (the real physical linear size of the object), this equation is dimensionally correct and becomes equivalent to the perspective formula, with p = y and z = x/c. This expresses the equivalence between physical zooming and perspective
!5 transformation: zooming in is equivalent to moving an object closer to the viewer, and zooming out is equivalent to moving the object farther away. To implement non-physical scaling, a may be set to a power law other than -1, and d' may be set to a physical linear size i0 other than the actual physical linear size d. In the context of a road map, where p may represent the displayed width of a road in pixels and d' may represent an imputed width in physical units, "non-physically proportional to the zoom level" means that the road width in display pixels increases or decreases with the zoom 5 level in a way other than being physically proportional to the zoom level, i.e. a ≠ -1. The scaling is distorted in a way that achieves certain desirable results. It is noted that "linear size" means one-dimensional size. For example, if one considers any 2 dimensional object and doubles
0 its "linear size" then one multiplies the area by 4 = 22. In the two dimensional case, the linear sizes X. die elements f an object may uπ^oT've l ngtii,- m: ttlliy, ^αius, diameter, and/or any other measurement that one can read off with a ruler on the Euclidean plane. The thickness of a line, the len-gth of a line, the diameter of a circle or disc, the length of one side of a polygon, and the distance between two points are a-11 examples of linear sizes. In this sense the "linear size" in two dimensions is the distance between two identified points of an object on a 2D Euclidean plane. For example, the linear size can be calculated by taking the square root of (dx2 + dy2) , where dx = xl_ - xO, dy = yl - yO, and the two identified points are given by the Cartesian coordinates (xO, yO) and (xl, yl) . The concept of "linear size" extends naturally to more than two dimensions; for example, if one considers a volumetric object, then doubling its linear size involves increasing the volume by 8 = 23. Similar measurements of linear size can also be defined for non-Euclidean spaces, such as the surface of a splxere. Any power law a < 0 will cause the rendered size of an element to decrease as one zooms out, and increase as one zooms in. When a < -1, the rendered size of the element will decrease faster than it would with proportional physical scaling as one zooms out. Conversely, when -1 < a < 0, the size of the rendered element decreases more slowly than it would with, proportional physical scaling as one zooms out. In accordance with at least one aspect of the invention, p(z), for a given length of a given object, is permitted to be substantially continuous so that during navigation the user does not experience a sudden jump or discontinuity in the size of an element of the image (as opposed to the conventional approaches that permit the most extreme discontinuity - a sudclen appearance or disappearance of an element during navigation) . In addition, it is preferred that p(z) monotonically decrease with zooming out such that zooming out causes the elements of the object become smaller (e.g., roads to become thinner), and such that zooming in causes the elements of the object become larger. This gives the user a sense of physicality about the object (s) of the image. The scaling features discussed above may be more fully understood with reference to FIG. 14, which is a log-log graph of a rendered line width in pixels for an Al highway versus the zoom level in meters/pixel. (Plotting log(z) on the x-axis and log(p) on the y-axis is convenient because the plots become straight lines due to the relationship log(xa) = a-log(x)). The basic characteristics of the line (road) width versus zoom level plot are :
(i) that the scaling of the road widths may be physically proportional to the zoom level when zoomed in (e.g., up to about 0.5 meters/pixel); (ii) that the scaling of the road widths may be non-physically proportional to the zoom level when zoomed out (e.g., above about 0.5 meters/pixel); and (iii) that the scaling of the road widths may be physically proportional to the zoom level when zoomed further out (e.g., above about 50 meters/pixel or higher depending on parameters which will be discussed in more detail below) .
As for the zone in which the scaling of the road widths is physically proportional to the zoom level, the scaling formula of p = d' • za, is employed where a = -1. In this example, a reasonable value for the physical width of an actual Al highway is about d' = 16 meters. Thus, the rendered width of the line representing the Al highway monotonically decreases with physical scaling as one zooms out at least up to a certain zoom level zO, say zO = 0.5 meters/pixel. The zoom level for zO = 0.5 is chosen to be an inner scale below which physical scaling is applied. This avoids a non-physical appearance when the roadmap is combined with other fine-scale GIS content with real physical dimensions. In this example, zO = 0.5 meters/pixel, or 2 pixels/meter, which when expressed as a map scale on a 15 inch display (with 1600x1200 pixel resolution) corresponds to a scale of about 1:2600. At d = 16 meters, which is a reasonable real physical width for Al roads, the rendered road will appear to be its actual size when one is zoomed in (0.5 meters/pixel or less) . At a zoom level of 0.1 meters/pixel, the rendered line is about 160 pixels wide. At a zoom level, of 0.5 meters/pixel, the renders*!' Jli <m sr 32 pixels wicte-' As for the zone in which the scaling of the road widths is non-physically proportional to the zoom level, the scaling formula of p = ' za, is employed where -1 < a < 0 (within a range of zoom levels zO and zl) . In this example, the non-physical scaling is performed between about z0=0.5 meters/pixel and zl=3300 meters/pixel. Again, when -1 < a < 0, the width of the rendered road decreases more slowly than it would with proportional physical scaling as one zooms out. Advantageously, this permits the Al road to remain visible (and distinguishable from other smaller roads) as one zooms out. For example, as shown in FIG. 5, the Al road 102 remains visible and distinguishable from other roads at the zoom level of z = 334 meters/pixel. Assuming that the physical width of the Al road is d' = d = 16 meters, the width of the rendered line using physical scaling would have been about 0.005 pixels at a zoom level of about 3300 meters/pixel, rendering it virtually invisible. Using non-physical scaling, however, where -1 < a < 0 (in this example, a is about -0.473), the width of the rendered line is about 0.8 pixels at a zoom level of 3300 meters/pixel, rendering it clearly visible. It is noted that the value for zl is chosen to be the most zoomed-out scale at which a given road still has "greater than physical" importance. By way of example, if the entire U.S. is rendered on a 1600x1200 pixel display, the resolution would be approximately 3300 meters/pixel or 3.3 kilometers/pixel. If one looks at the entire world, then there may be no reason for U.S. highways to assume enhanced importance relative to the view of the country alone. Thus, at zoom levels above zl, which in the example above is about 3300 meters/pixel, the scaling of the road widths is again physically proportional to the zoom level, but preferably with a large d' (much greater than the real width d) for continuity of p(z). In this zone, the scaling formula of p = d' za is employed where a = -1. In order for the rendered road width to be continuous at zl = 3300 meters/pixel, a new imputed physical width of the Al highway is chosen, for example.,, f ~ 1.65 ilometre! s . zl and the new valira.- f<w dr are: preii<F£a.biy chosen in such a way that, at the outer scale zl, the rendered width of the line will be a reasonable number of pixels. In this case, at a zoom level in which the entire nation may be seen on the display (about 3300 meters/pixel), Al roads may be about pixel wide, which is thin but still clearly visible; this corresponds to an imputed physical road width of 1650 meters, or 1.65 kilometers. The above suggests a specific set of equations for the rendered line width as a function of the zoom level: p (z) = dO -1, if z < zO p (z) = dl • za, if zO < z < zl , p (z) = d2 -1, if z ≥ zl .
The above form of p(z) has six parameters: zO, zl, dO, dl, d2 and a. zO and zl mark the scales at which the behavior of ρ(z) changes. In the zoomed-in zone (z ≤ zO) , zooming is physical (i.e., the exponent of z is -1), with a physical width off dO, which preferably corresponds to the real physical width d. In the zoomed-out zone (z ≥ zl) , zooming is again physical, but with a physical width of dl, which in general does not correspond to d. Between zO and zl, the rendered line width scales with a powe r law of a, which can be a value other than -1. Given the preference that p(z) is continuous, specifying zO, zl, dO and d2 is sufficient to uniquely determine dl and a, which is clearly shown in FIG. 14. The approach discussed above with respect to Al roads may be applied to the other road elements of the roadmap object. An example of applying these scaling techniques to the Al, A2, A3, A4, and A5 roads is illustrated in the log-log graph of FIG.. 15. In this example, zO = 0.5 meters/pixel for all roads, although it may vary from element to element depending on the context. As A2 roads are generally somewhat smaller that Al roads, dO = 12 meters. Further, A2 roads are "important," e.g., on the U.S. state level, so zl = 312 meters/pixel, which is approximately the rendering resolution for a single state (about 1/10 of the country in linear scale) . At this scale, it has been found that line widths of one pixel are desirable, so d2 = 312 meters is a reasonable setting. Using the general approach outlined above for Al and A2 roads, the parameters of the remaining elements of the roadmap object may be established. A3 roads: dO = 8 meters, zO = 0.5 meters/pixel, zl = 50 meters/pixel, and d2 = 100 meters. A4 streets: dO = 5 meters, zO = 0.5 meters/pixel, zl = 20 meters/pixel, and d2 = 20 meters. And A5 dirt roads: dO = 2.5 meters, zO = 0.5 meters/pixel, zl = 20 meters/pixel, and d2 = 20m. It is noted that using these parameter settings, A5 dirt roads look more and more like streets at zoomed-out zoom levels, while their physical scale when zoomed in is half as wide. The log-log plot of FIG. 15 summarizes the scaling behaviors for the road types. It is noted that at every scale the apparent width of A1>A2>A3>A4>=A5. Note also that, with the exception of dirt roads, the power laws all come out in the neighborhood of a = -0.41. The dotted lines all have a slope of -1 and represent physical scaling at different physical widths. From the top down, the corresponding physical widths of these dotted lines are: 1.65 kilometers, 312 meters, 100 meters, 20 meters, 16 meters, 12 meters, 8 meters, 5 meters, and 2.5 meters. When interpolation between a plurality of pre-rendered images is used, it is possible in many cases to ensure that the resulting interpolation is humanly indistinguishable or nearly indistinguishable from an ideal rendition of all lines or other primitive geometric elements at their correct pixel widths as determined by the physical and non-physical scaling equations. To appreciate this alternative embodiment of the current invention, some background on antialiased line drawing will be presented below. The discussion of antialiased line drawing will be presented in keeping with the roadmap example discussed at length above, in which all primitive elements are lines, and the line width is subject to the scaling equations as described previously. With reference to FIG. 16A, a one pixel wide vertical line drawn in black on white background, such that the horizontal!, jaαt irion of the ae is aligned exactly to the pixel grid, consists simply of a l-pixel-wide column of black pixels on a white background. In accordance with various aspects of the present invention, it is desirable to consider and accommodate the case where the line width is a non-integral number of pixels. With reference to FIG. 16B, if the endpoints of a line remain fixed, but the weight of the line is increased to be 1.5 pixels wide, then on an antialiased graphics display, the columns of pixels to the left and right of the central column are drawn at 25% grey. With reference to FIG. 16C, at 2 pixels wide, these flanking columns are drawn at 50% grey. With reference to FIG. 16D, at 3 pixels wide, the flanking columns are drawn at 100% black, and the result is three solid black columns as expected. This approach to drawing lines of non-integer width on a pixellated display results in a sense (or illusion) of visual continuity as line width changes, allowing lines of different widths to be clearly distinguished even if they differ in width only by a fraction of a pixel. In general, this approach, known as antialiased line drawing, is designed to ensure that the line integral of the intensity function (or "1-intensity" function, for black lines on a white background) over a perpendicular to the line drawn is equal to the line width. This method generalizes readily to lines whose endpoints do not lie precisely in the centers of pixels, to lines which are in other orientations than vertical, and to curves. Note that drawing the antialiased vertical lines of FIGS.
16A-D could also be accomplished by alpha-blending two images, one (image A) in which the line is 1 pixel wide, and the other (image
B) in which the line is 3 pixels wide. Alpha blending assigns to each pixel on the display (1-alpha) * (corresponding pixel in image A) + alpha* (corresponding pixel in image B) . As alpha is varied between zero and one, the effective width of the rendered line varies smoothly between one and three pixels. This alpha-blending approach only produces good visual results in the most general case if the difference between the two rendered line widths in images A and B is one pixel or less; otherwise, lines may appear haloed at intermediate widths,. This same approach can be applied to rendering points, polygons, and many other primitive graphi al elements at different linear sizes. Turning again to FIGS. 16A-D, the 1.5 pixel-wide line (FIG. 16B) and the 2 pixel-wide line (FIG. 16C) can be constructed by alpha-blending between the 1 pixel wide line (FIG. 16A) and the 3 pixel wide line (FIG. 16D) . With reference to FIGS. 17A-C, a 1 pixel wide line (FIG. 17A) , a 2 pixel wide line (FIG. 17B) and a 3 pixel wide line (FIG. 17C) are illustrated in an arbitrary orientation. The same principle applies to the arbitrary orientation of FIGS. 17A-C as to the case where the lines are aligned exactly to the pixel grid, although the spacing of the line widths between which to alpha-blend may need to be finer than two pixels for good results. In the context of the present map example, a set of images of different resolutions can be selected for pre-rendition with reference to the log-log plots of FIGS. 14-15. For example, reference is now made to FIG. 18, which is substantially similar to FIG. 14 except that FIG. 18 includes a set of horizontal lines and vertical lines. The horizontal lines indicate line widths between 1 and 10 pixels, in increments of one pixel. The vertical lines are spaced such that line width over the interval between two adjacent vertical lines changes by no more than two pixels. Thus, the vertical lines represent a set of zoom values suitable for pre-rendition, wherein alpha-blending between two adjacent such pre-rendered images will produce characteristics nearly equivalent to rendering the lines representing roads at continuously variable widths. Interpolation between the six resolutions represented by the vertical lines shown in FIG. 18 is sufficient to render the Al highways accurately using the scaling curve shown at about nine meters/pixel and above. Rendition below about nine meters/pixel does not require pre-rendition, as such views are very zoomed-in and thus show very few roads, making it more computationally efficient (and more efficient with respect to data storage requirements) to render them vectorially than to interpolate between pre-rendered images . At resolutions of more than about meters/pixel (suei; views emco pέ.3S l!arge fractions of the Earth's surface), the final jaκes--rendered image alone can be used, as it is a rendition usinxjjr 1 pixel wide lines. Lines that are thinner than a single pixel render the same pixels more faintly. Hence, to produce an image in which the Al lines are 0.5 pixels wide, the 1 pixel wide line image can be multiplied by an alpha of 0.5. In practice, a somewhat larger set of resolutions are pre- rendered, such that over each interval between resolutions, none of the scaling curves of FIG. 15 varies by more than one pixel. Reducing the allowed variation to one pixel can result in improved rendering quality. Notably, the tiling techniques contemplated and discussed in the following co-pending application may be considered in connection with the present invention: U.S. Patent Application No. 10/790,253, entitled SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER INTERFACE, filed March 1, 2004, Attorney Docket No. 489/2, the entire disclosure of which is hereby incorporated by reference. This tiling technique may be employed for resolving an image at a particular zoom level, even if that level does not coincide with a pre-rendered image. If each image in the somewhat larger set of resolutions is pre- rendered at the appropriate resolution and tiled, then the result is a complete system for zooming and panning navigation through a roadmap of arbitrary complexity, such that all lines appear to vary in width continuously in accordance with the scaling equations disclosed herein. Additional details concerning other techniques for blending images, which may be employed in connection with implementing the present invention, may be found in U.S. Provisional Patent Application No. 60/475,897, entitled SYSTEM AND METHOD FOR THE EFFICIENT, DYNAMIC AND CONTINUOUS DISPLAY OF MULTI RESOLUTIONAL VISUAL DATA, filed June 5, 2003, the entire disclosure of which is hereby incorporated by reference. Still further details concerning blending techniques that may be employed in connection with implementing the present invention may be found in U.S. Provisional Patent Application Serial No. 60/453,897, filed March 12, 2003,. entitled SYSTEM AND METHOD FOR FOVEATED, SEAMLESS, PROGRESSIVE" KΞMfiERING IN A ZOOMING USER INEREIiW, the entire disclosure of w2τxe£h Js: her.e&,
Figure imgf000026_0001
by reference. Advantageously, employing the above-discussed aspects of the present invention, the user enjoys the appearance of smooth and continuous navigation through the various zoom levels. Further, little or no detail abruptly appears or disappears when zooming from one level to another. This represents a significant advancement over the state of the art. It is contemplated that the various aspects of the present invention may be applied in numerous products, such as interactive software applications over the Internet, automobile-based software applications and the like. For example, the present invention may be employed by an Internet website that provides maps and driving directions to client terminals in response to user requests. Alternatively, various aspects of the invention may be employed in a GPS navigation system in an automobile. The invention may also be incorporated into medical imaging equipment, whereby detailed information concerning, for example, a patient's circulatory system, nervous system, etc. may be rendered and navigated as discussed hereinabove. The applications of the invention are too numerous to list in their entirety, yet a skilled artisan will recognize that they are contemplated herein and fall within the scope of the invention as claimed. The present invention may also be utilized in connection with other applications in which the rendered images provide a means for advertising and otherwise advancing commerce. Additional details concerning these aspects and uses of the present invention may be found in U.S. Provisional Patent Application No. 60/553,803, entitled METHODS AND APPARATUS FOR EMPLOYING MAPPING TECHNIQUES TO ADVANCE COMMERCE, filed on even date herewith, Attorney Docket No. 489/7, the entire disclosure of which is hereby incorporated by reference. An appendix is provided within this document and appears after the claims. The appendix is part of the disclosure of this application. Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative.- αf the principles arid applications of the present' invention. It is therefore to Ite understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims „•:■
APPENDIX"
APPLICATION FOR LETTERS PATENT
FOR
SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER INTERFACE
BY
BLAISE HILARY AGUERA Y ARCAS
Kaplan & Gilman, LP Attorney No. 489/2 SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER INTERFACE
RELATED APPLICATIONS
[0001J This application claims priority to U.S. Provisional No. 60/452,075, filed on March 5, 2003, U.S. Provisional No. 60/453,897, filed on March 12, 2003, U.S. Provisional No. 60/475,897, filed on June 5, 2003, and U.S. Provisional No. 60/474,313, filed on May 30, 2003.
FIELD OF THE INVENTION
[0002] The present invention relates generally to graphical zooming user interfaces (ZUI) for computers. More specifically, the invention is a system and method for progressively rendering zoomable visual content in a manner that is both computationally efficient, resulting in good user responsiveness and interactive frame rates, and exact, in the sense that vector drawings, text, and other non-photographic content is ultimately drawn without the resampling which would normally lead to degradation in image quality, and without interpolation of other images, which would also lead to degradation.
BACKGROUND OF THE INVENTION
[0003] Most present-day graphical computer user interfaces (GUIs) are designed using visual components of a fixed spatial scale. However, it was recognized from the birth ofthe field of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, and then zoom in on an area of interest. Many modem computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out ofthe visual components ofthe applications themselves. Further, zooming is normally a peripheral aspect ofthe user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom and pan continuously in a user-friendly manner is absent from prior art systems.
[0004] First, we set forth several definitions. A display is the device or devices used to output rendered imagery to the user. A frame buffer is used to dynamically represent the contents of at least a portion ofthe display. Display refresh rate is the rate at which the physical display, or portion thereof, is refreshed using the contents ofthe frame buffer. A frame buffer's frame rate is the rate at which the frame buffer is updated.
[0005] For example, in a typical personal computer, the display refresh rate is 60- 90 Hz. Most digital video, for example, has a frame rate of 24-30 Hz. Thus, each frame of digital video will actually be displayed at least twice as the display is refreshed. Plural frame buffers may be utilized at different frame rates and thus be displayed substantially simultaneously on the same display. This would occur, for example, when two digital videos with different frame rates were being played on the same display, in different windows.
[0006] One problem with zooming user interfaces (ZUI) is that the visual content has to be displayed at different resolutions as the user zooms. The ideal solution to this problem would be to display, in every consecutive frame, an exact and newly computed image based on the underlying visual content. The problem with such an approach is that the exact recalculation of each resolution ofthe visual content in real time as the user zooms is computationally impractical if the underlying visual content is complex.
[0007] As a result ofthe foregoing, many prior art ZUI systems use a plurality of precomputed images, each being a representation ofthe same visual content but at different resolutions. We term each of those different precomputed images a Level of Detail (LOD). The complete set of LODs, organized conceptually as a stack of images of decreasing resolution, is termed the LOD pyramid — see Fig. 1. In such prior systems, as zooming occurs, the system interpolates between the LODs and displays a resulting image at a desired resolution. While this approach solves the computational issue, it displays a final compromised image that is often blurred and unrealistic, and often involves loss of information due to the fact that it represents interpolation of different LODs. These interpolation errors are especially noticeable when the user stops zooming and has the opportunity to view a still image at a chosen resolution which does not precisely match the resolution of any ofthe LODs.
[0008] Another problem with interpolating between precomputed LODs is that this approach typically treats vector data in the same way as photographic or image data. Vector data, such as blueprints or line drawings, are displayed by processing a set of abstract instructions using a rendering algorithm, which can render lines, curves and other primitive shapes at any desired resolution. Text rendered using scalable fonts is an important special case of vector data. Image or photographic data (including text rendered using bitmapped fonts) are not so generated, but must be displayed either by interpolation between precomputed LODs or by resampling an original image. We refer to the latter herein as nonvector data.
[0009] Prior art systems that use rendering algorithms to redisplay vector data at a new resolution for each frame during a zoom sequence must restrict themselves to simple vector drawings only in order to achieve interactive frame rates. On the other hand, prior art systems that precompute LODs for vector data and interpolate between them, as for nonvector data, suffer from markedly degraded visual quality, as the sharp edges inherent in most vector data renditions are particularly sensitive to interpolation error. This degradation is usually unacceptable for textual content, which is a special case of vector data.
[0010] It is an object ofthe invention to create a ZUI that replicates the zooming effect a user would see if he or she actually had viewed a physical object and moved it closer to himself or herself.
[0011] It is an object ofthe invention to create a ZUI that displays images at an appropriate resolution but which avoids or diminishes the interpolation errors in the final displayed image. A further object ofthe present invention is to allow the user to zoom arbitrarily far in on vector content while maintaining a crisp, unblurred view ofthe content and maintaining interactive frame rates. [0012] A further object ofthe present invention is to allow the user to zoom arbitrarily far out to get an overview of complex vectorial content, while both preserving the overall appearance ofthe content and maintaining interactive frame rates.
[0013] A further object ofthe present invention is to diminish the user's perception of transitions between LODs or rendition qualities during interaction.
[0014] A further object ofthe present invention is to allow the graceful degradation of image quality by blurring when information ordinarily needed to render portions ofthe image is as yet incomplete.
[0015] A further object ofthe present invention is to gradually increase image quality by bringing it into sharper focus as more complete information needed to render portions ofthe image becomes available.
[0016] It is an object ofthe invention to optimally and independently render both vector and nonvector data.
[0017] These and other objects ofthe present invention will become apparent to those skilled in the art from a review ofthe specification that follows.
SUMMARY OF THE INVENTION
[0018] The above and other problems ofthe prior art are overcome in accordance with the present invention, which relates to a hybrid strategy for implementing a ZUI allowing an image to be displayed at a dynamically varying resolution as a user zooms in or out, rotates, pans, or otherwise changes his or her view of an image. Any such change in view is termed navigation. Zooming ofthe image to a resolution not equal to that of any ofthe predefined LODs is accomplished by displaying the image at a new resolution that is interpolated from predefined LODs that "surround" the desired resolution. By "surrounding LODs" we mean the LOD of lowest resolution which is greater than the desired resolution and the LOD of highest resolution which is less than the desired resolution. If the desired resolution is either greater than the resolution ofthe LOD with the highest available resolution or less than the resolution ofthe LOD with the lowest resolution, then there will be only a single "surrounding LOD". The dynamic interpolation of an image at a desired resolution based on a set of precomputed LODs is termed in the literature mipmapping or trilinear interpolation. The latter term further indicates that bilinear sampling is used to resample the surrounding LODs, followed by linear interpolation between these resampled LODs (hence trilinear). See, e.g.; Lance Williams. "Pyramidal Parametrics," Computer Graphics (Proc. SIGGRAPH '83) 17(3): 1-11 (1983). The foregoing document is incorporated herein by reference in its entirety. Obvious modifications of or extensions to the mipmapping technique introduced by Williams use nonlinear resampling and/or interpolation ofthe surrounding LODs. In the present invention it is immaterial whether the resampling and interpolation operations are zeroth-order (nearest-neighbor), linear, higher-order, or more generally nonlinear.
[0019] In accordance with the invention described herein, when the user defines an exact desired resolution, which is almost never the resolution of one ofthe predefined LODs, the final image is then displayed by preferably first displaying an intermediate final image. The intermediate final image is the first image displayed at the desired resolution before that image is refined as described hereafter. The intermediate final image may correspond to the image that would be displayed at the desired resolution using the prior art. [0020] In a preferred embodiment, the transition from the intermediate final image to the final image may be gradual, as explained in more detail below.
[0021] In an enhanced embodiment, the present invention allows LODs to be spaced in any resolution increments, including irrational increments (i.e. magnification or minification factors between consecutive LODs which cannot be expressed as the ratio of two integers), as explained in more detail below.
[0022] In another enhanced embodiment, portions ofthe image at each different LOD are denoted tiles, and such tiles are rendered in an order that minimizes any perceived imperfections to a viewer. In other embodiments, the displayed visual content is made up of plural LODs (potentially a superset ofthe surrounding LODs as described above), each of which is displayed in the proper proportion and location in order to cause the display to gradually fade into the final image in a manner that conceals imperfections.
[0023] The rendition of various tiles in plural LODs is accomplished in an order that optimizes the appearance ofthe visual content while staying within acceptable levels of computational complexity so that the system can run on standard computers with typical clock speeds available in most laptop and desktop personal computers.
[0024] The present invention involves a hybrid strategy, in which an image is displayed using predefined LODs during rapid zooming and panning, but when the view stabilizes sufficiently, an exact LOD is rendered and displayed. The exact LOD is rendered and displayed at the precise resolution chosen by the user, which is normally different from the predefined LODs. Because the human visual system is insensitive to fine detail in the visual content while it is still in motion, this hybrid strategy can produce the illusion of continuous "perfect rendering" with far less computation. BRIEF DESCRIPTION OF THE DRAWINGS
[0025] Figure 1 depicts an LOD pyramid (in this case the base of the pyramid, representing the highest-resolution representation, is a 512x512 sample image, and successive minifications of this image are shown in factors of 2);
[0026] Figure 2 depicts a flow chart for use in an exemplary embodiment ofthe invention;
[0027] Figure 3 is another flow chart that shows how the system displays the final image after zooming;
[0028] Figure 4 is the LOD pyramid of Figure 1 with grid lines added showing the subdivision of each LOD into rectangular tiles of equal size in samples;
[0029] Figure 5 is another flow chart, for use in connection with the present invention, and it depicts a process for displaying rendered tiles on a display;
[0030] Figure 6 shows a concept termed irrational tiling, explained in more detail herein; and
[0031] Figure 7 depicts a composite tile and the tiles that make up the composite tile, as explained more fully below.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0032] Figure 2 shows a flowchart of a basic technique for implementation of the present invention. The flowchart of Figure 2 represents an exemplary embodiment of the invention and would begin executing when an image is displayed at an initial resolution. It is noted that the invention may be used in the client server model, but that the client and server may be on the same or different machines. Thus, for example, there could be a set of discrete LODs stored remotely at a host computer, and the user can be connected to said host through a local PC. The actual hardware platform and system utilized are not critical to the present invention. [0033] The flowchart is entered at start block 201 with an initial view of an image at a particular resolution. In this example, the image is taken to be static. The image is displayed at block 202. A user may navigate that image by moving, for example, a computer mouse. The initial view displayed at block 202 will change when the user navigates the image. It is noted that the underlying image may itself be dynamic, such as in the case of motion video, however, for purposes of this example, the image itself is treated as static. As explained above, any image to be displayed may also have textual or other vector data and/or nonvector data such as photographs and other images. The present invention, and the entire discussion below, is applicable regardless of whether the image comprises vector or nonvector data, or both. [0034] Regardless of the type of visual content displayed in block 202, the method transfers control to decision point 203 at which navigation input may be detected. If such input is not detected, the method loops back to block 202 and continues displaying the stationary visual content. If a navigation input is detected, control will be transferred to block 204 as shown. [0035] Decision point 203 may be implemented by a continuous loop in software looking for a particular signal that detects movement, an interrupt system in hardware, or any other desired methodology. The particular technique utilized to detect and analyze the navigation request is not critical to the present invention. Regardless of the methodology used, the system can detect the request, thus indicating a desire to navigate the image. Although much ofthe discussion herein relates to zooming, it is noted that the techniques are applicable to zooming, panning, or otherwise navigating. Indeed, the techniques described herein are applicable to any type of dynamic transformation or change in perspective on the image. Such transformations may include, for example, three dimensional translation and rotation, application of an image filter, local stretching, dynamic spatial distortion applied to selected areas of the image, or any other kind of distortion that might reveal more information. Another example would be a virtual magnifying glass, that can get moved over the image and which magnifies parts of the image under the virtual magnifying glass. When decision point 203 detects that a user is initiating navigation, block 204 will then render and display a new view of the image, which may be, for example, at a different resolution from the prior displayed view. [0036] One straightforward prior art technique of displaying the new view is based upon inteφolating LODs as the user zooms in or out. The selected LODs may be those two LODs that "surround" the desired resolution; i.e.; the resolution of the new view. The interpolation, in prior systems, constantly occurs as the user zooms and is thus often implemented directly in the hardware to achieve speed. The combination of detection of movement in decision point 205 and a substantially immediate display of an appropriate inteφolated image at block 204 results in the image appearing to zoom continuously as the user navigates. During zooming in or out, since the image is moving, an interpolated image is sufficient to look realistic and clear. Any interpolation error is only minimally detectable by the human visual system, as such errors are disguised by the constantly changing view ofthe image. [0037] At decision point 205, the system tests whether or not the movement has substantially ceased. This can be accomplished using a variety of techniques, including, for example, measuring the rate of change of one or more parameters ofthe view. That is, the methodology ascertains whether or not the user has arrived at the point where he has finished zooming. Upon such stabilization at decision point 205, control is transferred to block 206, where an exact image is rendered, after which control returns to block 203. Thus, at any desired resolution, the system will eventually display an exact LOD. [0038] Notably, the display is not simply rendered and displayed by an interpolation of two predefined LODs, but may be rendered and displayed by re- rendering vector data using the original algorithm used to render the text or other vector data when the initial view was displayed at block 202. Nonvector data may also be resampled for rendering and displayed at the exact required LOD. The required re- rendering or resampling may be performed not only at the precise resolution required for display at the desired resolution, but also on a sampling grid corresponding precisely to the correct positions ofthe display pixels relative to the underlying content, as calculated based on the desired view. As an example, translation of the image on the display by ' a pixel in the display plane does not change the required resolution, but it does alter the sampling grid, and therefore requires re-rendering or resampling ofthe exact LOD. [0039] The foregoing system of Fig. 2 represents a hybrid approach in which interpolation based upon predefined LODs is utilized while the view is changing (e.g. navigation is occurring) but an exact view is rendered and displayed when the view becomes substantially stationary. [0040] For puφoses of explanation herein, the term render refers to the generation by the computer of a tile at a specific LOD based upon vector or nonvector data. With respect to nonvector data, these may be rerendered at an arbitrary resolution by resampling an original image at higher or lower resolution. [0041] We turn now to the methodology of rendering and displaying the different portions of the visual content needed to achieve an exact final image as represented by block 206 of Fig. 2. With reference to Fig. 3, when it is determined that navigation has ceased, control is transferred to block 303 and an interpolated image is immediately displayed, just as is the case during zooming. We call this interpolated image that may be temporarily displayed after the navigation ceases the intermediate final image, or simply an intermediate image. This image is generated from an interpolation ofthe surrounding LODs. In some cases, as explained in more detail below, the intermediate image may be interpolated from more than two discrete LODs, or from two discrete LODs other than the ones that surround the desired resolution. [0042] Once the intermediate image is displayed, block 304 is entered, which causes the image to begin to gradually fade towards an exact rendition of the image, which we term the final image. The final image differs from the intermediate image in that the final image may not involve interpolation of any predefined LODs. Instead, the final image, or portions thereof, may comprise newly rendered tiles. In the case of photographic data, the newly rendered tiles may result from resampling the original data, and in the case of vector data, the newly rendered tiles may result from rasterization at the desired resolution. [0043] It is also noted that it is possible to skip directly from block 303 to 305, immediately replacing the interpolated image with a final and exact image. However, in the preferred embodiment, step 304 is executed so the changeover from the intermediate final image to the final image is done gradually and smoothly. This gradual fading, sometimes called blending, causes the image to come into focus gradually when navigation ceases, producing an effect similar to automatic focusing in cameras or other optical instruments. The illusion of physicality created by this effect is an important aspect ofthe present invention. [0044] Following is a discussion of the manner in which this fading or blending may take place in order to minimize perceived irregularities, sudden changes, seams, and other imperfections in the image. It is understood however that the particular technique of fading is not critical to the present invention, and that many variations will be apparent to those of skill in the art. [0045] Different LODs differ in the number of samples per physical area of the underlying visual content. Thus, a first LOD may take a 1 inch by 1 inch area of a viewable object and generate a single 32 by 32 sample tile. However, the information may also be rendered by taking the same 1 inch by 1 inch area and representing it as a tile that is 64 by 64 samples, and therefore at a higher resolution. [0046] We define a concept called irrational tiling. Tiling granularity, which we will write as the variable g, is defined as the ratio ofthe linear tiling grid size at a higher- resolution LOD to the linear tiling grid size at the next lower-resolution LOD. In the Williams paper introducing trilinear interpolation, g = 2. This same value of g has been used in other prior art. Although LODs may be subdivided into tiles in any fashion, in an exemplary embodiment each LOD is subdivided into a grid of square or rectangular tiles containing a constant number of samples (except, as required, at the edges of the visual content). Conceptually, when g = 2, each tile at a certain LOD "breaks up" into 2x2=4 tiles at the next higher-resolution LOD (again, except potentially at the edges), as shown in Figure 4. [0047] There are fundamental shortcomings in tilings of granularity 2. Usually, if a user zooms in on a random point in a tile, every g-fold increase in zoom will require the rendition of a single additional tile conesponding to the next higher-resolution LOD near the point toward which the user is zooming. However, if a user is zooming in on a grid line in the tiling grid, then two new tiles need to be rendered, one on either side of the line. Finally, if a user is zooming in on the intersection of two grid lines, then four new tiles need to be rendered. If these events — requests for 1, 2 or 4 new tiles with each g- fold zoom — are interspersed randomly throughout an extended zooming sequence, then overall performance will be consistent. However, a grid line in any integral-granularity tiling (i.e. where g is a whole number) remains a grid line for every higher-resolution LOD. [0048] Consider, for example, zooming in on the center of a very large image tiled with granularity 2. We will write the (x,y) coordinates of this point as Xz,Vz), adopting the convention that the visual content falls within a square with corners (0,0), (0,1), (1,0) and (1,1). Because the center is at the intersection of two grid lines, as the user reaches each higher-resolution LOD, four new tiles need to be rendered every time; this will result in slow performance and inefficiency for zooming on this particular point. Suppose, on the other hand, that the user zooms in on an irrational point — meaning a grid point (x,y) such that x and y cannot be expressed as the ratios of two whole numbers. Examples of such numbers are pi (=3.14159...) and the square root of 2 (=1.414213...). Then, it can easily be demonstrated that the sequence of l's, 2's and 4's given by the number of tiles that need to be rendered for every g-fbld zoom is quasi-random, i.e. follows no periodic pattern. This kind of quasi-random sequence is clearly more desirable from the point of view of performance; then there are no distinguished points for zooming from a performance standpoint. [0049] Irrational tiling resolves this issue: g itself is taken to be an irrational number, typically the square root of 3, 5 or 12. Although this means that on average 3, 5 or 12 tiles (correspondingly) at a given LOD are contained within a single tile at the next lower-resolution LOD, note that the tiling grids at consecutive LODs no longer "agree" on any grid lines in this scheme (except potentially at the leading edges of the visual content, x=0 and y=0, or at some other preselected single grid line along each axis). If g is chosen such that it is not the nft root of any integer (pi is such a number), then no LODs will share any grid lines (again, potentially except x=0 and y=0). Hence it can be shown that each tile may randomly overlap 1, 2, or 4 tiles at the next lower LOD, whereas with g=2 this number is always 1. [0050] With irrational tiling granularity, zooming in on any point will therefore produce a quasi-random stream of requests for 1, 2 or 4 tiles, and performance will be on average uniform when zooming in everywhere. Perhaps the greatest benefit of irrational tiling emerges in connection with panning after a deep zoom. When the user pans the image after having zoomed in deeply, at some point a grid line will be moved onto the display. It will usually be the case that the region on the other side of this grid line will correspond to a lower-resolution LOD than th rest of the display; it is desirable, however, for the difference between these resolutions to be as small as possible. With integral g, however, the difference will often be extremely large, because grid lines can overlap over many consecutive LODs. This creates "deep cracks" in resolution over the node area, as shown in Figure 6(a). [0051] On the other hand, because grid lines in an irrational tiling never overlap those of an adjacent LOD (again with the possible exception of one grid line in each direction, which may be at one corner ofthe image), discontinuities in resolution of more than one LOD do not occur. This increased smoothness in relative resolution allows the illusion of spatial continuity to be much more convincing. [0052] Figure 6(b) illustrates the advantage gained by irrational tiling granularity. Figure 6 shows cross-sections through several LODs of the visual content; each bar represents a cross-section of a rectangular tile. Hence the second level from the top, in which there are two bars, might be a 2x2=4 tile LOD. The curves 601, drawn from top to bottom, represent the bounds ofthe visible area ofthe visual content at the relevant LOD during a zooming operation: as the resolution is increased (zooming in to reveal more detail), the area under examination decreases. Darker bars (e.g., 602) represent tiles which have already been rendered over the course ofthe zoom. Lighter bars have not yet been rendered, so cannot be displayed. Note that when the tiling is integral as in Figure 6(a), abrupt changes in resolution over space are common; if the user were to pan right after the zoom, then at the spatial boundary indicated by the arrow, four LODs would "end" abruptly. The resulting image would look sharp to the left of this boundary, and extremely blurry to the right. The same visual content represented using an irrational tiling granularity lacks such resolution "cracks": adjacent LODs do not share tile boundaries, except as shown at the left edge. Mathematically, this shared boundary may occur at most in one position on the x-axis and at one position on the y-axis. In the embodiment shown these shared boundaries are positioned at y=0 and x=0, but, if present, they may also be placed at any other position. [0053] Another benefit of irrational tiling granularity is that it allows finer control of g, since there are a great many more irrational numbers than integers, particularly over the useful range where g is not too large. This additional freedom can be useful for tuning the zooming performance of certain applications. If g is set to the irrational square root of an integer (such as sqrt(2), sqrt(5) or sqrt(8)), then in the embodiment described above the grid lines of alternate LODs would align exactly; if g is an irrational cube root, then every third LOD would align exactly; and so on. This confers an additional benefit with respect to limiting the complexity of a composite tiling, as defined below. [0054] An important aspect of the invention is the order in which the tiles are rendered. More particularly, the various tiles ofthe various LODs are optimally rendered such that all visible tiles are rendered first. Nonvisible tiles may not be rendered at all. Within the set of visible tiles, rendition proceeds in order of increasing resolution, so that tiles within low-resolution LODs are rendered first. Within any particular LOD, tiles are rendered in order of increasing distance from the center ofthe display, which we refer to as foveated rendering. To sort such tiles in the described order, numerous sorting algorithms such as heapsort, quicksort, or others may be used. To implement this ordering, a lexigraphic key may be used for sorting "requests" to render tiles, such that the outer subkey is visibility, the middle subkey is resolution in samples per physical unit, and the inner subkey is distance to the center ofthe display. Other methods for ordering tile rendering requests may also be used. The actual rendering ofthe tiles optimally takes place as a parallel process with the navigation and display described herein. When rendering and navigation/display proceed as parallel processes, user responsiveness may remain high even when tiles are slow to render. [0055] We now describe the process of rendering a tile in an exemplary embodiment. If a tile represents vector data, such as alphabetic typography in a stroke based font, then rendering ofthe tile would involve running the algorithm to rasterize the alphabetic data and possibly transmitting that data to a client from a server. Alternatively, the data fed to the rasterization algorithm could be sent to the client, and the client could run the algorithm to rasterize the tile. In another example, rendering of a tile involving digitally sampled photographic data could involve resampling of that data to generate the tile at the appropriate LOD. For discrete LODs that are prestored, rendering may involve no more than simply transmitting the tile to a client computer for subsequent display. For tiles that fall between discrete LODs, such as tiles in the final image, some further calculation as described above may be required. [0056] At any given time, when the tiles are rendered and the image begins to fade toward the exact image, the actual display may comprise different mixes of different tiles from different LODs. Thus, any portion of the display could contain for example, 20% from LOD 1, 40% from LOD 2, and 40% from LOD 3. Regardless of the tiles displayed, the algorithm attempts to render tiles from the various LODs in a priority order best suited to supply the rendered tiles for display as they are most needed. The actual display ofthe rendered tiles will be explained in more detail later with reference to Figure 5. [0057] In what follows we describe a method for drawing the plural LODs using an algorithm which can guarantee spatial and temporal continuity of image detail. The algorithm is designed to make the best use of all rendered tiles, using high-resolution tiles in preference to lower-resolution tiles covering the same display area, yet using spatial blending to avoiding sharp boundaries between LODs, and temporally graduated blending weights to blend in higher detail if and when it becomes available (i.e. when higher-resolution tiles have been rendered). Unlike the prior art, this algorithm and variants thereof can result in more than two LODs being blended together at a given point on the display; it can also result in blending coefficients that vary smoothly over the display area; and it can result in blending coefficients that evolve in time even after the user has stopped navigating. In this exemplary embodiment it is nonetheless computationally efficient, and can be used to render imagery as partially transparent, or with an overall transparency that varies over the image area, as will become apparent. [0058] We define herein a composite tile area, or simply a composite tile. To define a composite tile we consider all of the LODs stacked on top of each other. Each LOD has its own tile grid. The composite grid is then formed by the projection of all of the grids from all ofthe LODs onto a single plane. The composite grid is then made up of various composite tiles of different sizes, defined by the boundaries of tiles from all of the different LODs. This is shown conceptually in Fig. 7. Fig. 7 depicts the tiles from three different LODs, 701 through 703, all representing the same image. One can imagine the LODs 701 through 703 being stacked up on top of each other. In such a case, if one lined up corner 750 from each of these LODs an l stacked them on top of each other, an area represented by 740 would be inside the area represented by 730, and the areas represented by 730 and 740, would be inside the area .represented by 720. Area 710 of Fig. 7 shows that there would be a single "composite tile" 710. Each of the composite tiles is examined during each frame, wherein the fraune rate may be typically greater than ten frames per second. Note that, as explained above, this frame rate is not necessarily the display refresh rate. [0059] Fig. 5 depicts a flow chart of an algorithm for updating the frame buffer as tiles are rendered. The arrangement of Fig. 5 is intended to op-erate on every composite tile in the displayed image each time the frame buffer is updated. Thus, for example, if a frame duration is 1/20 of a second, each ofthe composite tiles on the entire screen would preferably be examined and updated during each 1/20 of a se&ond. When a composite tile is operated upon by the process of Fig. 5, the composite tile may lack the relevant tiles in one or more LODs. The process of Fig. 5 attempts to display each composite tile as a weighted average of all the available superimposed tiles within which the composite tile lies. Note that composite tiles are defined in such a way that they fall within exactly one tile at any given LOD; hence the weighted average can be expressed as a relative proportion of each LOD. The process attempts to determine the appropriate weights for each LOD within the composite tile, and to vary those weights gradually over space and time to cause the image to gradually fade towards the final images discussed above. [0060] The composite grid includes plural vertices which are defined to be any intersection or corner of gridlines in the composite grid. These are termed composite grid vertices. We define an opacity for each LOD at each composite grid vertex. The opacity can be expressed as a weight between 0.0 and 1.0, and the sum of all the LOD weights at each vertex should therefore be 1.0 if the desired result is for the image to be totally opaque. The current weights at any particular time for each LOD at each vertex are maintained in memory. [0061] The algorithm for updating vertex weights proceeds as described below. [0062] The following variables, which are taken to be numbers between 0.0 and 1.0, are kept in memory for each tile: centerOpacity, cornerOpacity for each corner (4 if the tiling is a rectangular grid), and edgeOpacity for each edge (4 if the tiling is a rectangular grid). When a tile is first rendered, all of its opacities as just listed are normally set to 1.0. [0063] During a drawing pass, the algorithm walks through the composite tiling once for each relevant LOD, beginning with the highest-resolution LOD. In addition to the per-tile variables, the algorithm maintains the following variables: levelOpacityGrid and opacityGrid. Both of these variables are again numbers between 0.0 and 1.0, and are maintained for each vertex in the composite tiling. [0064] The algorithm walks through each LOD in turn, in order from highest- resolution to lowest, performing the following operations. First 0.0 is assigned to levelOpacityGrid at all vertices. Then, for each rendered tile at that LOD (which may be a subset ofthe set of tiles at that LOD, if some have not yet been rendered), the algorithm updates the parts of the levelOpacityGrid touching that tile based on the tile's centerOpacity, cornerOpacity and edgeOpacity values: [0065] If the vertex is entirely in the interior ofthe tile, then it gets updated using centerOpacity. [0066] If the vertex is e.g. on the tile's left edge, it gets updated with the left edgeOpacity. [0067] If the vertex is e.g. on the top right corner, it gets updated with the top right cornerOpacity. [0068] "Updating" means the following: if the pre-existing levelOpacityGrid value is greater than 0.0, then set the new value to the minimum of the present value, or the value it's being updated with. If the pre-existing value is zero (i.e. this vertex hasn't been touched yet) then just set the levelOpacityGrid value to the value it's being updated with. The end result is that the levelOpacityGrid at each vertex position gets set to the minimum nonzero value with which it gets updated. [0069] The algorithm then walks through the levelOpacityGrid and sets to 0.0 any vertices that touch a tile which has not yet been rendered, termed a hole. This ensures spatial continuity of blending: wherever a composite tile falls within a hole, at the current LOD, drawing opacity should fade to zero at all vertices abutting that hole. [0070] In an enhanced embodiment, the algorithm can then relax all levelOpacityGrid values to further improve spatial continuity of LOD blending. The situation as described thus far can be visualized as follows: every vertex is like a tentpole, where the levelOpacityGrid value at that point are the tentpole's height. The algorithm has thus far ensured that at all points bordering on a hole, the tentpoles have zero height; and in the interior of tiles that have been rendered, the tentpoles are set to some (probably) nonzero value. In the extreme case, perhaps all the values inside a rendered tile are set to 1.0. Assume for purposes of illustration that the rendered tile has no rendered neighbors yet, so the border values are 0.0. We have not specified how narrow the "margin" is between a 0.0 border tentpole and one ofthe 1.0 internal tentpoles. If this margin is too small, then even though the blending is technically continuous, the transition may be too sharp when measured as an opacity derivative over space. The relax operation smoothes out the tent, always preserving values of 0.0, but possibly lowering other tentpoles to make the function defined by the tent surface smoother, i.e. limiting its maximum spatial derivative. It is immaterial to the invention which of a variety of methods are used to implement this operation; one approach, for example, is to use selective low-pass filtering, locally replacing every nonzero value with a weighted average of its neighbors while leaving zeroes intact. Other methods will also be apparent to those skilled in the art. [0071] The algorithm then walks over all composite grid vertices, considering corresponding values of levelOpacityGrid and opacityGrid at each vertex: if levelOpacityGrid is greater than 1.0-opacityGrid, then levelOpacityGrid is set to 1.0- opacityGrid. Then, again for each vertex, corresponding values of levelOpacityGrid are added to opacityGrid. Due to the previous step, this can never bring opacityGrid above 1.0. These steps in the algorithm ensure that as much opacity as possible is contributed by higher-resolution LODs when they are available, allowing lower-resolution LODs to "show through" only where there are holes. [0072] The final step in the traversal of the current LOD is to actually draw the composite tiles at the current LOD, using levelOpacityGrid as the per-vertex opacity values. In an enhanced embodiment, levelOpacityGrid can be multiplied by a scalar overallOpacity variable in the range 0.0 to 1.0 just before drawing; this allows the entire image to be drawn with partial transparency given by the overallOpacity. Note that drawing an image-containing polygon, such as a rectangle, with different opacities at each vertex is a standard procedure. It can be accomplished, for example, using industry- standard texture mapping functions using the OpenGL or Direct3D graphics libraries. In practice, the drawn opacity within the interior of each such polygon is spatially interpolated, resulting in a smooth change in opacity over the polygon. [0073] In another enhanced embodiment of the algorithm described above, tiles maintain not only their current values of centerOpacity, cornerOpacity and edgeOpacity (called the current values), but also a parallel set of values called targetCenterOpacity, targetComerOpacity and targetEdgeOpacity (called the target values). In this enhanced embodiment, the current values are all set to 0.0 when a tile is first rendered, but the the target values are all set to 1.0. Then, after each frame, the current values are adjusted to new values closer to the target values. This may be implemented using a number of mathematical formulae, but as an example, it can be done in the following way: newNalue = oldNalue*(l-b) + targetNalue*b, where b is a. rate in greater than 0.0 and less than 1.0. A value of b close to 0.0 will result in a very slow transition toward the target value, and a value of b close to 1.0 will result in a very rapid transition toward the target value. This method of updating opacities results in exponential convergence toward the target, and results in a visually pleasing impression of temporal continuity. Other formulae can achieve the same result. [0074] The foregoing describes the preferred embodiment of the present invention. The invention is not limited to such preferred embodiment, and various modifications consistent with the appended claims are included within the invention as well.
F1G.1
Figure imgf000055_0001
FIG. 2
Figure imgf000056_0001
FIG. 3
Figure imgf000057_0001
Figure imgf000058_0001
FIG. 5
Figure imgf000059_0001
FIG. 6
Figure imgf000060_0001
HG.7
Figure imgf000061_0001
Figure imgf000061_0003
Figure imgf000061_0004
703
Figure imgf000061_0005
Figure imgf000061_0002
Title: SYSTEM AND METHOD FOR FOVEATED, SEAMLESS, PROGRESSIVE RENDERING IN A ZOOMING USER INTERFACE Inventor. BLAISE HILARY AGUERA Y ARCAS
Field of he Invention The present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for progressively rendering arbitrarily large or complex visual content in a zooming environment while maintaining good user responsiveness and high frame rates. Although it is necessary in some situations to temporarily degrade the quality of the rendition to meet these goals, the present invention largely masks this degradation by exploiting well-known properties of the human visual system.
Background of the invention
Most present-day graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale. However, it was recognized from the birth ofthe field of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, then zoom in on an area of interest. Many modern computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out ofthe visual components ofthe applications themselves. Further, zooming is normally a peripheral aspect ofthe user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent. In a more generalized zooming framework, any kind of visual content could be zoomed, and zooming would be as much a part ofthe user's experience as panning. Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s1; recent movies continue the trend2. A number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present.3 In 1991, some of these ideas were formalized in U.S. Patent 5,341 ,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability"). The prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have
1 e.g. Stanley Kubrick's 2001: A Spats Ctfysse , Turner Entertainment Company, a Time Warner company (1968).
2 e.g. Steven Spielberg's Minority Report, 20th Century Fox and Dreamworks Pictures (2002).
3 An early appearance is W.G Donelson, Spatial Mamgem&tf: flrfarmukm, Proceedings of Computer Graphics SIGGRAPH (1978), ACM Press, p. 203-9. A recent example is Zanvas.com, which launched in the summer of 2002. undergone some development since4. To my knowledge, however, no major application based on a full ZUI (Zooming User Interface) has yet appeared on the mass market, due in part to a number of technical shortfalls, one of which is addressed in the present invention.
Summary of the invention
The present invention embodies a novel idea on which a newly developed zooming user interface framework (hereafter referred to by its working name, Noss) is based. Noss is more powerful, more responsive, more visually compelling and of more general utility than its predecessors due to a number of innovations in its software architecture. This patent is specifically about Noss's approach to object tiling, level-of-detail blending, and render queueing. A multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an image pyramid). In some technological contexts where continuous zooming is used, such as 3D gaming, two adjacent levels of detail which bracket the desired level of detail are blended together to render each frame, because it is not normally the case that the desired level of detail is exactly one of those represented by the discrete set. Such techniques are sometimes referred to as trilinear filtering or mipmapping. In most cases, mipmapped image pyramids are premade, and kept in short-term memory (i.e. RAM) continuously during the zooming operation; thus any required level of detail is always available. In some advanced 3D rendering scenarios, the image pyramid must itself be rendered within an Perlin describes subsequent developments at http://mrLnyu.edu/projects/zui/. animation loop; however, in these cases the complexity of this first rendering pass must be carefully controlled, so that overall frame rate does not suffer. In the present context, it is desirable to be able to navigate continuously by zooming and panning tlirough an unlimited amount of content of arbitrary visual complexity. This content may not render quickly, and moreover it may not be available immediately, but need to be downloaded from a remote location over a low-bandwidth connection. It is thus not always possible to render levels of detail (first pass) at a frame rate comparable to the desired display frame rate (second pass). Moreover it is not in general possible to keep pre-made image pyramids in memory for all content; image pyramids must be rendered or re-rendered as needed, and this rendering may be slow compared to the desired frame rate. The present invention involves both strategies for prioritizing the (potentially slow) rendition ofthe parts ofthe image pyramid relevent to the current display, and stategies for presenting the user with a smooth, continuous perception ofthe rendered content based on partial information, i.e. only the currently available subset ofthe image pyramid. In combination, these strategies make near-optimal use ofthe available computing power or bandwidth, while masking, to the extent possible, any image degradation resulting from incomplete image pyramids. Spatial and temporal blending are exploited to avoid discontinuities or sudden changes in image sharpness. An objective ofthe present invention is to allow sampled (i.e. "pixellated") visual content to be rendered in a zooming user interface without degradation in ultimate image quality relative to conventional trilinear interpolation. A further objective ofthe present invention is to allow arbitrarily large or complex visual content to be viewed in a zooming user interface. A further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network. A further objective ofthe present invention is to allow the user to zoom arbitrarily far in on visual content while ma taining interactive frame rates. A further objective ofthe present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and mamtaining interactive frame rates. A further objective ofthe present invention is to mimmize the user's perception of transitions between levels of detail or rendition qualities during interaction. A further objective ofthe present invention is to allow the graceful degradation of image quality by continuous blurring when detailed visual content is as yet unavailable, either because the information needed to render it is unavailable, or because rendition is still in progress. A further objective ofthe present invention is to gracefully increase image quality by gradual sharpening when renditions of certain parts ofthe visual content first become available. These and other objectives ofthe present invention will become apparent to those skilled in the art from a review ofthe specification that follows. Prior art: multir solution imagery and zooming user interfaces
From a technical perspective, zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome. One such limitation is on the size of a document that can be "opened" from a computer application, as traditionally the entirety of such a document must be "loaded" before viewing or editing can begin. Even when the amount of short-term memory (normally RAM) available to a particular computer is large, this limitation is felt, because all ofthe document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open" command and being able to begin viewing or editing unacceptably long. Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem. Table 1 below shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed.
SUBSTITUTE ^rHEET (RULE 26) Table 1.
Figure imgf000068_0001
*Note that these figures represent realistic compressed sizes at intermediate quality, not raw image data. Specifically, we assume 1 bit/pixel for the sizes up to 40MB, and 0.25 bits/pixel for the larger images, which are generally more compressible.
**Local wireless networks may be considerably faster; this figure refers to wireless wide- area networks ofthe type often used for wireless PDAs.
Nearly every image on the Web at present is under 100K (0.1MB), because most users are connected to the Web at DSL or lower bandwidth, and larger images would take too long to download. Even in a local setting, on a typical user's hard drive, it is unusual to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) images would often be useful is attested to by the fact that illustrated books, atlases, maps, newspapers and artworks in the average home include a great many images which, if digitized at full resolution, would easily be tens of megabytes in size. Several years ago the dearth of large images was largely due to a shortage of storage space in repositories, but advances in hard drive technology, the ease of burning CDROMs, and the increasing prevalence of large networked servers has made repository space no longer the limiting factor. The main bottleneck now is bandwidth, followed by short-term memory (i.e. RAM) space. The problem is in reality much worse than suggested by the table above, because in most contexts the user is interested not only in viewing a single image, but an entire collection of images; if the images are larger than some modest size, then it becomes impractical to wait while one image downloads after another. Modern image compression standards, such as JPEG20005, are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition. The image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl . Obviously the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but
5 http://www.jpeg.org/JPEG2000Jitml in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficientl — more efficiently, in fact, than would usually be possible with a non- hierarchical representation ofthe high-resolution image alone. If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in the repository, then a natural consequence is that as the image is transferred across the data link to the cache, the user can obtain a low- resolution overview ofthe entire image very rapidly; finer and finer details will then "fill in" as the transmission progresses. This is known as incremental or progressive transmission. Properly implemented, it has the property that any image at all — no matter how large — can be viewed in its spatial entirety (though not in its full detail) almost immediately, even if the bandwidth ofthe connection to the repository is very modest. • Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the user than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right. Hidden in this advance is a new concept of what it means to "open" an image which does not fit into the classical application model described in the previous section. We are now imagining that the user is able to view an image as it downloads, a concept whose usefulness arises from the fact that the broad strokes ofthe image are available very soon after download begins, and perhaps well before downloading is finished. It therefore makes no sense for the application to force the user to wait while downloading finishes; the application should instead display what it can ofthe document immediately, and not cause delays or unnecessarily interrupt its interaction with the user while it continues downloading the details "in the background". This requires that the application do more than one task at once, which is termed multithreading. Note that most modern web browsers use multithreading in a slightly different capacity: to simultaneously download images on a web page, while displaying the web page's textual layout and remaining responsive to the user in the meantime. In this case we can think about the embedded images themselves as being additional levels of detail, which enhance the basic level of detail comprised ofthe web page's bare-bones text layout. This analogy will prove important later. Clearly hierarchical image representation and progressive transmission of the image document are an advance over linear representation and transmission. However, a further advance becomes important when an image, at its highest level of detail, has more information (i.e. more pixels) than the user's display can show at once. With current display technology, this is always the case for the bottom four kinds of images in the Table 1, but smaller displays (such as PDA screens) may not be able to show even the bottom eight. This makes a zooming feature imperative for large images: it is useless to view an image larger than the display if it is not possible to zoom in to discover the additional detail. When a large image begins to download, presumably the user is viewing it in its entirety. The first levels of detail are often so coarse that the displayed image will appear either blocky or blurry, depending on the kind of interpolation used to spread the small amount of information available over a large display area. The image will then refine progressively, but at a certain point it will "saturate" the display with information, making any additional detail downloaded have no visible effect. It therefore makes no sense to continue the download beyond this point at all. Suppose, however, that the user decides to zoom in to see a particular area in much more detail, making the effective projected size ofthe image substantially larger than the physical screen. Then, in the downloading model described in the previous section, higher levels of detail would need to be downloaded, in increasing order. The difficulty is that every level of detail contains approximately four times the information ofthe previous level of detail; as the user zooms in, the downloading process will inevitably lag behind. Worse, most ofthe information being downloaded is wasted, as it consists of high-resolution detail outside the viewing area. Clearly, what is needed is the ability to download only selected parts of certain levels of detail — that is, only the detail which is visible should be downloaded. With this alteration, an image browsing system can be made that is not only capable of viewing images of arbitrarily large size, but is also capable of navigating (i.e. zooming and panning) through such images efficiently at any level of detail. Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order. This model, by contrast, is random-access, meaning that only selected parts ofthe information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session. The computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display. To make random access efficient, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmission. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges ofthe image, as its dimensions may not be an exact multiple ofthe nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. The resulting tiled image pyramid is shown in Figure 2. Note that the "tip" ofthe pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1. The JPEG2000 image format includes all ofthe features just described for representing tiled, multiresolution and random-access images. Thus far we have considered only the case of static images, but the same techniques, with application-specific modifications, can be applied to nearly any type of visual document. This includes (but is not limited to) large texts, maps or other vector graphics, spreadsheets, video, and mixed documents such as web pages. Our discussion thus far has also implicitly considered a viewing-only application, i.e. one in which only the actions or methods corresponding to opening and drawing need be defined. Clearly other methods may be desirable, such as the editing commands implemented by paint programs for static images, the editing commands implemented by word processors for texts, etc. Yet consider the problem of editing a text: the usual actions, such as inserting typed input, are only relevant over a certain range of spatial scales relative to the underlying document. If we have zoomed out so far that the text is no longer legible, then interactive editing is no longer possible. It can also be argued that interactive editing
SUBSTITUTE SrHEET (RULE 26) is no longer possible if we have zoomed so far in that a single letter fills the entire screen. Hence a zooming user interface may also restrict the action of certain methods to their relevant levels of detail. When a visual document is not represented internally as an image, but as more abstract data — such as text, spreadsheet entries, or vector graphics — it is necessary to generalize the tiling concept introduced in the previous section. For still images, the process of rendering a tile, once obtained, is trivial, since the information (once decompressed) is precisely the pixel-by-pixel contents ofthe tile. The speed bottleneck, moreover, is normally the transfer of compressed data to the computer (e.g. downloading). However, in some cases the speed bottleneck is in the rendition of tiles; the information used to make the rendition may already be stored locally, or may be very compact, so that downloading no longer causes delay. Hence we will refer to the production of a finished, fully drawn tile in response to a tile drawing request as tile rendition, with the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant. A complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on. Hence documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary ofthe parent document. We call each such document a node, borrowing from programming terminology for trees. Although drawing methods are defined for all nodes at all levels of detail, other methods corresponding to application-specific functionality may be defined only for certain nodes, and their action may be restricted only to certain levels of detail. Hence some nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode" — which can be navigated by zooming and panning. There are a number of immediate consequences for a well-implemented zooming user interface, including: - - It is able to browse very large documents without downloading them in their entirety from the repository; thus even documents larger than the available short-term memory, or whose size would otherwise be prohibitive, can be viewed without limitation. - - Content is only downloaded as needed during navigation, resulting in optimally efficient use ofthe available bandwidth. - - Zooming and panning are spatially intuitive operations, allowing large amounts of information to be organized in an easily understood way. - - Since "screen space" is essentially unlimited, it is not necessary to minimize windows, use multiple desktops, or hide windows behind each other to work on multiple documents or views at once. Instead, documents can be arranged as desired, and the user can zoom out for an overview of all of them, or in on particular ones. This does not preclude the possibility of rearranging the positions (or even scales) of such documents to allow any combination of them to be visible at a useful scale on the screen at the same time. Neither does it necessarily preclude combining zooming with more traditional approaches. - - Because zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale. - - High-resolution displays no longer imply shrinking text and images to small (sometimes illegible) sizes; depending on the level of zooming, they either allow more content to be viewed at once, or they allow content to be viewed at normal size and higher fidelity. - - The vision impaired can easily navigate the same content as normally sighted people, simply by zooming in farther. - These benefits are particularly valuable in the wake of the explosion in the amount of mformation available to ordinary computers connected to the Web. A decade ago, the kinds of very large documents which a ZUI enables one to view were rare, and moreover such documents would have taken up so much space that very few would have fit on the repositories available to most computers (e.g., a 40MB hard disk). Today, however, we face a very different situation: servers can easily store vast documents and document hierarchies, and make this information available to any client connected to the Web. Yet the bandwidth ofthe connection between these potentially vast repositories and the ordinary user is far lower than the bandwidth ofthe connection to a local hard disk. This is precisely the scenario in which the ZUI confers its greatest advantages over conventional graphical user interfaces.
SUBSTITUTE ^fHEET (RULE 26) Detailed description ofthe invention For a particular view of a node at a certain desired resolution, there is some set of tiles, at a certain LOD, which would need to be drawn for the rendition to include at least one sample per screen pixel. Note that views do not normally fall precisely at the resolution of one of the node's LODs, but rather at an intermediate resolution between two of them. Hence, ideally, in a zooming environment the client generates the set of visible tiles at both of these LODs — ust below and just above the actual resolution — and uses some interpolation to render the pixels on the display based on this information. The most common scenario is linear interpolation, both spatially and between levels of detail; in the graphics literature, this is usually referred to as trilinear interpolation. Closely related techniques are commonly used in 3D graphics architectures for texturing.6 Unfortunately, downloading (or programmatically rendering) tiles is often slow, and especially during rapid navigation, not all the necessary tiles will be available at all times. The innovations in this patent therefore focus on a combination of strategies for presenting the viewer with a spatially and temporally continuous and coherent image that approximates this ideal image, in an environment where tile download or creation is happening slowly and asynchronously. In the following we use two variable names, and g. /refers to the sampling density of a tile relative to the display, defined in #1. Tiling granularity, which we will write as the variable g, is defined as the ratio ofthe linear tiling grid size at a some LOD to the linear tiling grid size at the next lower LOD. This is in general presumed to be
6 S.L. Tanimoto and T. Pavlidis, A bkraπ iad data structure for picture proossing, Computer Graphics and Image Processing, Vol. 4, p. 104-119 (1975); Lance Williams, Pyr mdd P ram&vs, ACM SIGGRAPH Conference Proceedings (1982). constant over different levels of detail for a given node, although none ofthe innovations presented here rely on constant g. In the JPEG2000 example considered in the previous section, g=2: conceptually, each Sale "breaks up" into 2x2=4 tiles at the next higher LOD. Granularity 2 is by far the most common in similar applications, but in the present context g may take other values. 1. Level of detail tile request queuing. We first introduce a system and method for queuing tile requests that allows the client to bring a composite image gradually "into focus", by analogy with optical instruments. Faced with the problem of an erratic, possibly low-bandwidth connection to an information repository containing hierarchically tiled nodes, a zooming user interface must address the problem of how to request tiles during navigation. In many situations, it is unrealistic to assume that all such requests will be met in a timely manner, or even that they will be met at all during the period when the information is relevant (i.e. before the user has zoomed or panned elsewhere.) It is therefore desirable to prioritize tile requests intelligently. The "outermost" rule for tile request queuing is increasing level of detail relative to the display. This "relative level of detail", which is zoom-dependent, is given by the number /= (linear tile size in tile pixels)/(projected tile length on the screen measured in screen pixels). If/=1, then tile pixels are 1:1 with screen pixels; if/=10, then the information in the tile is far more detailed than the display can show (10* 10=100 tile pixels fit inside a single screen pixel); and if/=0.1 then the tile is coarse relative to the display (every tile pixel must be "stretched", or interpolated, to cover 10* 10=100 display pixels). This rule ensures that, if a region ofthe display is undersampled (i.e. only coarsely defined) relative to the rest ofthe display, the client's first priority will be to fill in this "resolution hole". If more than one level of detail is missing in the hole, then requests for all levels of detail with < 1, plus the next higher level of detail (to allow LOD blending — see #5), are queued in increasing order. At first glance, one might suppose that this introduces unnecessary overhead, because only the finest of these levels of detail is strictly required to render the current view; the coarser levels of detail are redundant, in that they define a lower-resolution image on the display. However, these coarser levels cover a larger area — in general, an area considerably larger than the display. The coarsest level of detail for any node in fact includes only a single tile by construction, so a client rendering any view of a node will invariably queue this "outermost" tile first. This is an important point for viewing robustness. By robustness we mean that the client is never "at a loss" regarding what to display in response to a user's parining and zooming, even if there is a large backlog of tile requests waiting to be filled. The client simply displays the best (i.e. highest resolution) image available for every region on the display. At worst, this will be the outermost tile, which is the first tile ever requested in connection with the node. Therefore, every spatial part ofthe node will always be renderable based on the first tile request alone; all subsequent tile requests can be considered incremental refinements. Falling back on lower-resolution tiles creates the impression of blurring the image; hence the overall effect is that the display may appear blurry after a sizeable pan or zoom. Then, as tile requests are filled, the image sharpens. A simple calculation shows that the overhead created by requesting "redundant" lower-resolution tiles is in fact minor — in particular, it is a small price to pay for the robustness of having the node image well-defined everywhere from the start. 2. Foveated tile request queuing. Within a relative level of detail, tile requests are queued by increasing distance to the center ofthe screen, as shown in Figure 3. This technology is inspired by the human eye, which has a central region — the fovea — specialized for high resolution. Because zooming is usually associated with interest in the central region ofthe display, foveated tile request queuing usually reflects the user's implicit prioritization for visual information during inward zooms. Furthermore, because the user's eye generally spends more time looking at regions near the center ofthe display than the edge, residual blurriness at the display edge is less noticeable than near the center. The transient, relative increase in sharpness near the center ofthe display produced by zooming in using foveal tile request order also mirrors the natural consequences of zooming out — see Figure 4. The figure shows two alternate "navigation paths": in the top row, the user remains stationary while viewing a single document (or node) occupying about two thirds of the display, which we assume can be displayed at very high resolution. Initially the node contents are represented by a single, low- resolution tile; then tiles at the next LOD become available, making the node contents visible at twice the resolution with four (=2x2) tiles; 4x4=16 and 8x8=64 tile versions follow. In the second row, we follow what happens if the user were to zoom in on the shaded square before the image displayed in the top row is fully refined. Tiles at higher levels of detail are again queued, but in this case only those that are partially or fully visible. Refinement progresses to a point comparable to that ofthe top row (in terms of number of visible tiles on the display). The third row shows what is available if the user then zooms out again, and how the missing detail is filled in. Although all levels of detail are shown, note that in practice the very fine levels would probably be omitted from the displays on the bottom row, since they represent finer details than the display can convey. Note that zooming out normally leaves the center ofthe display filled with, more detailed tiles than the periphery. Hence this ordering of tile requests consistently prioritizes the sharpness ofthe central area ofthe display during all navigation. 3. Temporal LOD blending. Without further refinements, when a tile needed for the current display is downloaded or constructed and drawn for the first time, it will immediately obscure part of an underlying, coarser tile presumably representing the same content; the user experiences this transition as a sudden change in blurriness in some region ofthe display. Such sudden transitions are unsightly, and unnecessarily draw the user's attention to details ofthe software's implementation. Our general approach to ZUI design is to create a seamless visual experience for the user, which does not draw attention to the existence of tiles or other aspects ofthe software which should remain "under the hood". Therefore, when tiles first become available, they are not displayed immediately, but blended in over a number of frames — typically over roughly one second. The blending function may be linear (i.e. the opacity ofthe new tile is a linear function of time since the tile became available, so that halfway through the fixed blend- in interval the new tile is 50% opaque), exponential, or follow any other interpolating function. In an exponential blend, every small constant interval of time corresponds to a constant percent change in the opacity; for example, the new tile may become 20% more
SUBSTITUTE ^HEET (RULE 26) opaque at every frame, which results in the sequence of opacities over consecutive frames 20%, 36%, 49%, 59%, 67%, 74%, 79%, 83%, 87%, 89%, 91%, 93%, etc. Mathematically, the exponential never reaches 100%, but in practice, the opacity becomes indistinguishable from 100% after a short interval. An exponential blend has the advantage that the greatest increase in opacity occurs near the begirrning of the blending-in, which makes the new information visible to the user quickly while still preserving acceptable temporal continuity. In our reference implementation, the illusion created is that regions ofthe display come smoothly into focus as the necessary information becomes available. 4. Continuous LOD. In a situation in which tile download or creation is lagging behind the user's navigation, adjacent regions ofthe display may have different levels of detail. Although the previous innovation (#3) addresses the problem of temporal discontinuity in level of detail, a separate innovation is needed to address the problem of spatial discontinuity in level of detail. If uncorrected, these spatial discontinuities are visible to the user as seams in the image, with visual content drawn more sharply to one side ofthe seam. We resolve this problem by allowing the opacity of each tile to be variable over the tile area; in particular, this opacity is made to go to zero at a tile edge if this edge abuts a region on the display with a lower relative level of detail. It is also important in some situations to make the opacity at each corner ofthe tile go to zero if the comer touches a region of lower relative level of detail. Figure 5 shows our simplest reference implementation for how each tile can be decomposed into rectangles and triangles, called tile shards, such that opacity changes continuously over each tile shard. Tile X, bounded by the square aceg, has neighboring tiles L, R, T and B on the left, right, top and bottom, each sharing an edge. It also has neighbors TL, TR, BL and BR sharing a single comer. Assume that tile X is present. Its "inner square", iiii, is then fully opaque. (Note that repeated lowercase letters indicate identical vertex opacity values.) However, the opacity ofthe sunounding rectangular frame is determined by whether the neighboring tiles are present (and fully opaque). Hence if tile TL is absent, then point g will be fully transparent; if L is absent, then points h will be fully transparent, etc. We term the border region ofthe tile (X outside iiii) the blending flaps. Figure 6 illustrates the reference method used to interpolate opacity over a shard. Part (a) shows a constant opacity rectangle. Part (b) is a rectangle in which the opacities of two opposing edges are different; then the opacity over the interior is simply a linear interpolation based on the shortest distance of each interior point from the two edges. Part (c) shows a bilinear method for interpolating opacity over a triangle, when the opacities of all three comers abc may be different. Conceptually, every interior point/? subdivides the triangle into three sub-triangles as shown, with areas A, B and C. The opacity at j? is then simply a weighted sum ofthe opacities at the corners, where the weights are the fractional areas ofthe three sub-triangles (i.e. A, B and C divided by the total triangle area A+B+C). It is easily verified that this formula identically gives the opacity at a vertex when p moves to that vertex, and that if p is on the triangle edge then its opacity is a linear interpolation between the two connected vertices. Since the opacity within a shard is determined entirely by the opacities at its vertices, and neighboring shards always share vertices (i.e. there are no T-junctions), this method ensures that opacity will vary smoothly over the entire tiled surface. In combination with the temporal LOD blending of #3, this strategy causes the relative level of detail visible to the user to be a continuous function, both over the display area and in time. Both spatial seams and temporal discontinuities are thereby avoided, presenting the user with a visual experience reminiscent of an optical instrument bringing a scene continuously into focus. For navigating large documents, the speed with which the scene comes into focus is a function ofthe bandwidth ofthe connection to the repository, or the speed of tile rendition, whichever is slower. Finally, in combination with the foveated prioritization of innovation #2, the continuous level of detail is biased in such a way that the central area ofthe display is brought into focus first. 5. Generalized linear-mipmap-linear LOD blending. We have discussed strategies and reference implementations for ensuring spatial and temporal smoothness in apparent LOD over a node. We have not yet addressed, however, the manner in which levels of detail are blended during a continuous zooming operation. The method used is a generalization of trilinear interpolation, in which adjacent levels of detail are blended linearly over the intermediate range of scales. At each level of detail, each tile shard has an opacity as drawn, which has been spatially averaged with neighboring tile shards at the same level of detail for spatial smoothness, and temporally averaged for smoothness over time. The target opacity is 100% if the level of detail undersamples the display, i.e. /<1 (see #1). However, if it oversamples the display, then the target opacity is decreased linearly (or using any other monotonic function) such that it goes to zero if the oversampling is g-fold. Like trilinear interpolation, this causes continuous blending over a zoom operation, ensuring that the perceived level of detail never changes suddenly. However, unlike conventional trilinear interpolation — which always involves a blend of two levels of detail — the number of blended levels of detail in this scheme can be one, two, or more. A number larger than two is transient, and caused by tiles at more than one level of detail not having been fully blended in temporally yet. A single level is also usually transient, in that it normally occurs when a lower-than-ideal LOL> is "standing in" at 100% opacity for higher LODs which have yet to be downloaded or constructed and blended in. The simplest reference implementation for rendering the set of tile shards for a node is to use the so-called "painter's algorithm": all tile shards are rendered in back-to- front order, that is, from coarsest (lowest LOD) to finest (highest LOD which oversamples the display less than g-fold). The target opacities of all but the highest LOD are 100%, though they may transiently be rendered at lower opacity if their temporal blending is incomplete. The highest LOD has variable opacity, depending on how much it oversamples the display, as discussed above. Clearly this reference implementation is not optimal, in that it may render shards which are then fully obscured by subsequently rendered shards. More optimal implementations are possible through the use of data structures and algorithms analogous to those used for hidden surface removal in 3D graphics. 6. Motion anticipation. During rapid zooming or panning, it is especially difficult for tile requests to keep up with demand. Yet during these rapid navigation patterns, the zooming or panning motion tends to be locally well-predicted by linear extrapolation (i.e. it is difficult to make sudden reversals or changes in direction). Thus we exploit this temporal motion coherence to generate tile requests sligtitly ahead of time, thus improving visual quality. This is accomplished by making tile -requests using a virtual viewport which elongates, dilates or contracts in the direction of motion when panning or zooming, thus pre-empting requests for additional tiles. When navigation ceases, the virtual viewport relaxes over a brief interval of time back to the real viewport.
Note that none ofthe above innovations are restricted to rectangular tilings; they generalize in an obvious fashion to any tiling pattern which can be defined on a grid, such as triangular or hexagonal tiling, or heterogeneous tilings consisting of mixtures of such shapes, or entirely arbitrary tilings. The only explicit change which needs to be made to accommodate such alternate tilings is to define triangulations of the tile shapes analogous to those of Figure 5, such that the opacities ofthe edges and the interior can all be controlled independently.
Figure imgf000087_0001
Figure imgf000087_0002
FJ6- a
Figure imgf000088_0001
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000091_0001
Figure imgf000092_0001
linear interpolation of opacity over a polygon
Figure imgf000092_0002
Title: SYSTEM AND METHOD FOR THE EFFICIENT, DYNAMIC AND CONTINUOUS DISPLAY OF MULTIRESOLUTION VISUAL DATA Inventor. BLAISE HI ARY AGUERA Y ARCAS
Field of the Invention The present invention relates generally to multiresolution imagery. More specifically, the invention is a system and method for efficiently blending together visual representations of content at different resolutions or levels of detail in real time. The method ensures perceptual continuity even in highly dynamic contexts, in which the data being visualized may be changing, and only partial data may be available at any given time. The invention has applications in a number of fields, including (but not limited to) zooming user interfaces (ZUIs) for computers.
Background ofthe invention
In many situations involving the display of complex visual data, these data are stored or computed hierarchically, as a collection of representations at different levels of detail (LODs). Many multiresolution methods and representations have been devised for different kinds of data, including (for example, and without limitation) wavelets for digital images, and progressive meshes for 3D models. Multiresolution methods are also used in mathematical and physical simulations, in situations where a possibly lengthy calculation can be performed more "coarsely" or more "finely"; this invention also applies to such simulations, and to other situations in which multiresolution visual data
91 may be generated interactively. Further, the invention applies in situations in which visual data can be obtained "on the fly" at different levels of detail, for example, from a camera with machine-controllable pan and zoom. The present invention is a general approach to the dynamic display of such multiresolution visual data on one or more 2D displays (such as CRTs or LCD screens). In explaining the invention we will use as our main example the wavelet decomposition of a large digital image (e.g. as used in the JPEG2000 image format). This decomposition takes as its starting point the original pixel data, normally an array of samples on a regular rectangular grid. Each sample usually represents a color or luminance measured at a point in space corresponding to its grid coordinates. In some applications the grid may be very large, e.g. tens of thousands of samples (pixels) on a side, or more. This large size can present considerable difficulties for interactive display, especially when such images are to be browsed remotely, in environments where the server (where the image is stored) is connected to the client (where the image is to be viewed) by a low-bandwidth connection. If the image data are sent from the server to the client in simple raster order, then all the data must be transmitted before the client can generate an overview of the entire image. This may take a long time. Generating such an overview may also be computationally expensive, perhaps, for example, requiring downsampling a 20,000x20,000 pixel image to 500x500 pixels. Not only are such operations too slow to allow for interactivity, but they also require that the client have sufficient memory to store the full image data, which in the case just cited is 1.2 gigabytes (GB) for an 8-bit RGB color image (=3*20,000A2). \
1 Nearly every image on the Web at present is under 100K (0.1MB), because most 2 users are connected to the Web at DSL or lower bandwidth, and larger images would take 3 too long to download. Even in a local setting, on a typical user's hard drive, it is unusual 4 to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) 5 images would often be useful is attested to by the fact that illustrated books, atlases, 6 maps, newspapers and artworks in the average home include a great many images which, 7 if digitized at full resolution, would easily be tens of megabytes in size.
8 Several years ago the dearth of large images was largely due to a shortage of non- 9 volatile storage space (repository space), but advances in hard drive technology, the ease
10 of burning CDROMs, and the increasing prevalence of large networked servers has made
11 repository space no longer the limiting factor. The main bottlenecks now are bandwidth,
12 followed by short-term memory (i.e. RAM) space.
13 Modern image compression standards, such as JPEG20001, are designed to
14 address precisely this problem. Rather than storing the image contents in a linear fashion
15 (that is, in a single pass over the pixels, normally from top to bottom and left to right),
16 they are based on a multiresolution decomposition. The image is first resized to a
17 hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel
18 image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and
19 lxl . We refer to the factor by which each resolution differs in size from the next
20 higher — here 2 — as the granularity, which we represent by the variable g. The
21 granularity may change at different scales, but here, for example and without limitation,
22 we will assume that g is constant over the "image pyramid". Obviously the fine details http://www.jpeg.oig/JPEG2000Jhitml y3 are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently- sized images or scales are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non-hierarchical representation ofthe high-resolution image alone. If one imagines that the sequence of multiresolution versions ofthe image is stored in order of increasing size in a server's repository, then a natural consequence is that if the image is transferred from a server to a client, the client can obtain a low- resolution overview ofthe entire image very rapidly; finer and finer details will then "fill in" as the transmission progresses. This is known as incremental or progressive transmission, and is one ofthe major advantages of multiresolution representations. When progressive transmission is properly implemented, any image at all — no matter how large — can be viewed by a client in its spatial entirety (though not in its full detail) almost immediately, even if the bandwidth ofthe connection to the server is very modest. Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the client than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right.
4 SUBSTITUTE SHfcET (RULE 26) To make random access efficient in a dynamic and interactive context, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmissicn. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges ofthe image, as its dimensions may not be an exact multiple ofthe nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. Hence if we assume 64x64 pixel tiles, the 512x512 pixel image considered earlier has 8x8 tiles at its highest level of detail, 4x4 at the 256x256 level, 2x2 at the 128x128 level, and a single tile at the remaining levels of detail. The JPEG2000 image format includes the features just described for representing tiled, multiresolution and random-access images. If a detail of a large, tiled JPEG2000 image is being viewed interactively by a client on a 2D display of limited size and resolution, then some particular set of adjacent tiles, at a certain level of detail, are needed to produce an accurate rendition. In a dynamic context, however, these may not all be available. Tiles at coarser levels of detail often will be available, however, particularly if the user began with a broad overview of the image. Since tiles at coarser levels of detail span a much wider area spatially, it is likely that the entire area of interest is covered by some combination of available tiles. This implies that the image resolution available will not be constant over the display area. In a previously filed provisional patent application, I have proposed methods for "fading out" the edges of tiles where they abut a blank space at the same level of detail; this avoids the abrupt visual discontinuity in sharpness that would otherwise result when the "coverage" of a fine level of detail is incomplete. The edge regions of tiles reserved for blending are referred to as blending flaps. The simple reference implementation for displaying a finished composite image is a "painter's algorithm": all relevant tiles (that is, tiles overlapping the display area) in the coarsest level of detail are drawn first, followed by all relevant tiles in progressively finer levels of detail. At each level of detail blending was applied at the edges of incomplete areas as described. The result, as desired, is that coarser levels of detail "show through" only in places where they are not obscured by finer levels of detail. Although this simple algorithm works, it has several drawbacks: first, it is wasteful of processor time, as tiles are drawn even when they will ultimately be partly or even completely obscured. In particular, a simple calculation shows that each display pixel will often be (re)drawn log2(f) times, where f is the magnification factor ofthe display relative to the lowest level of detail. Second, this technique relies on compositing in the framebuffer — meaning that, at intermediate points during the drawing operation, the regions drawn do not have their final appearance; this makes it necessary to use double-buffering or related methods and perform the compositing off-screen to avoid the appearance of flickering resolution. Third, unless an additional compositing operation is applied, this technique can only be used for an opaque rendition — it is not possible, for example, to ensure that the final rendition has 50% opacity everywhere, allowing other content to "show through". This is because the painter's algorithm relies precisely on the effect of one "layer of paint" (i.e. level of detail) fully obscuring the one underneath; it is not known in advance where a level of detail will be obscured, and where not. The Invention The present invention resolves these issues, while preserving all the advantages ofthe painter's algorithm. One of these advantages is the ability to deal with any kind of LOD tiling, including non-rectangular or irregular tilings, as well as irrational grid tilings, for which I am filing a separate provisional patent application. Tilings generally consist of a subdivision, or tesselation, ofthe area containing the visual content into polygons. For a tiling to be useful in a multiresolution context it is generally desirable that the areas of tiles at lower levels of detail be larger than the areas of tiles at higher levels of detail; the multiplicative factor by which their sizes differ is the granularity g, which we will assume (but without limitation) to be a constant. In the following, an irrational but rectangular tiling grid will be used to describe the improved algorithm. Generalizations to other tiling schemes should be evident to anyone skilled in the art. The improved algorithm consists of four stages. In the first stage, a composite grid is constructed in the image's reference frame from the superposition ofthe visible parts of all ofthe tile grids in all ofthe levels of detail to be drawn. When the irrational tiling innovation (detailed in a separate provisional patent application) is used, this results in an irregular composite grid, shown schematically in Figure 1. The grid is further augmented by grid lines corresponding to the x- and ^-values which would be needed to draw the tile "blending flaps" at each level of detail (not shown in Figure 1, because the resulting grid would be too dense and visually confusing). This composite grid, which can be defined by a sorted list of x- and ^-values for the grid lines, has the property that the vertices of all ofthe rectangles and triangles that would be needed to draw all visible tiles (including their blending flaps) lie at the intersection of an x and y grid line. Let
SUBSTITUTE ^EET (RULE 26) there be n grid lines parallel to the .x-axis and m grid lines parallel to they-axis. We then construct a two-dimensional n * m table, with entries corresponding to the squares ofthe grid. Each grid entry has two fields: an opacity, which is initialized to zero, and a list of references to specific tiles, which is initially empty. The second stage is to walk through the tiles, sorted by decreasing level of detail (opposite to the naϊve implementation). Each tile covers an integral number of composite grid squares. For each of these squares, we check to see if its table entry has an opacity less than 100%, and if so, we add the current tile to its list and increase the opacity accordingly. The per-tile opacity used in this step is stored in the tile data structure. When this second stage is complete, the composite grid will contain entries corresponding to the correct pieces of tiles to draw in each grid square, along with the opacities with which to draw these "tile shards". Normally these opacities will sum to one. Low-resolution tiles which are entirely obscured will not be referenced anywhere in this table, while partly obscured tiles will be referenced only in tile shards where they are partly visible. The third stage ofthe algorithm is a traversal ofthe composite grid in which tile shard opacities at the composite grid vertices are adjusted by averaging with neighboring vertices at the same level of detail, followed by readjustment ofthe vertex opacities to preserve the summed opacity at each vertex (normally 100%). This implements a refined version ofthe spatial smoothing of scale described in a separate provisional patent application. The refinement comes from the fact that the composite grid is in general denser than the 3x3 grid per tile defined in innovation #4, especially for low-resolution tiles. (At the highest LOD, by construction, the composite gridding will be at least as fine as necessary.) This allows the averaging technique to achieve greater smoothness in apparent level of detail, in effect by creating smoother blending flaps consisting of a larger number of tile shards. Finally, in the fourth stage the composite grid is again traversed, and the tile shards are actually drawn. Although this algorithm involves multiple passes over the data and a certain amount of bookkeeping, it results in far better performance than the naive algorithm, because much less drawing must take place in the end; every tile shard rendered is visible to the user, though sometimes at low opacity. Some tiles may not be drawn at all. This contrasts with the naϊve algorithm, which draws every tile intersecting with the displayed area in its entirety. An additional advantage of this algorithm is that it allows partially transparent nodes to be drawn, simply by changing the total opacity target from 100% to some lower value. This is not possible with the naϊve algorithm, because every level of detail except the most detailed must be drawn at full opacity in order to completely "paint over" any underlying, still lower resolution tiles. When the view is rotated in the x-y plane relative to the node, some minor changes need to be made for efficiency. The composite grid can be constructed in the usual manner; it may be larger than the grid would have been for the unrotated case, as larger coordinate ranges are visible along a diagonal. However, when walking tlirough tiles, we need only consider tiles that are visible (by the simple intersecting polygon criterion). Also, composite grid squares outside the viewing area need not be updated during the traversal in the second or third stages, or drawn in the fourth stage. Note that a number of other implementation details can be modified to optimize performance; the algorithm is presented here in a form that makes its operation and essential features easiest to understand. A graphics programmer skilled in the art can easily add the optimizing implementation details. For example, it is not necessary to keep a list of tiles per tile shard; instead, each level of detail can be drawn immediately as it is completed, with the correct opacity, thus requiring only the storage of a single tile identity per shard at any one time. Another exemplary optimization is that the total opacity rendering left to do, expressed in terms of (area) x (remaining opacity), can be kept track of, so that the algorithm can quit early if everything has already been drawn; then low levels of detail need not be "visited" at all if they are not needed. The algorithm can be generalized to arbitrary polygonal tiling patterns by using a constrained Delaunay triangulation instead of a grid to store vertex opacities and tile shard identifiers. This data structure efficiently creates a triangulation whose edges contain every edge in all ofthe original LOD grids; accessing a particular triangle or vertex is an efficient operation, which can take place in of order n*log(ή) time (where n is the number of vertices or triangles added). The resulting triangles are moreover the basic primitive used for graphics rendering on most graphics platforms.
construction of composite node grid from superimposed irrational LOD tilings
Figure imgf000103_0001
(a) finest LOD (b) next-finest LOD, g=sqrt(3) (c) composite grid
Figure imgf000103_0002
FIGURE 1
101 METHODS AND APPARATUS FOR EMPLOYING IMAGE NAVIGATION TECHNIQUES TO ADVANCE COMMERCE
BACKGROUND OF THE INVENTION
The present invention is directed to methods and apparatus for the application of image navigation techniques in advancing commerce, for example, by way of providing new environments for advertising and purchasing products and/or services.
Digital mapping and geospatial applications are a booming industry. They have been attracting rapidly increasing investment from businesses in many different markets — from candidates like Federal Express, clothing stores and fast food chains. In the past several years, mapping has also become one ofthe very few software applications on the web that generate significant interest (so-called "killer apps"), alongside search engines, web-based email, and matchmaking.
Although mapping should in principle be highly visual, at the moment its utility for end users lies almost entirely in generating driving directions. The map images which invariably accompany the driving directions are usually poorly rendered, convey little information, and cannot be navigated conveniently, making them little more than window dressing. Clicking on a pan or zoom control causes a long delay, during which the web browser becomes unresponsive, followed by the appearance of a new map image bearing little visual relationship to the previous image. Although in principle computers should be able to navigate digital maps more effectively than we navigate paper atlases, in practice visual navigation of maps by computer is still inferior.
DESCRIPTION OF THE INVENTION
The present invention is intended to be employed in combination with a novel technology permitting continuous and rapid visual navigation of a map (or any other image), even over a low bandwidth connection. This technology relates to new techniques for rendering maps continuously in a panning and zooming environment. It is an application of fractal geometry to line and point rendering, allowing networks of roads (ID curves) and dots marking locations (OD points) to be drawn at all scales, producing the illusion of continuous physical zooming, while still keeping the "visual density" ofthe map bounded. Related techniques apply to text labels and iconic content. This new approach to rendition avoids such effects as the sudden appearance or disappearance of small roads during a zoom, an adverse effect typical of digital map drawing. The details of this navigation technology may be found in U.S. Patent Application No.: , filed on even date herewith, entitled METHODS AND APPARATUS FOR NAVIGATING AN IMAGE, Attorney Docket No.: 489/9, the entire disclosure of which is hereby incorporated by reference. This navigation technology may be referred to herein as "Noss."
102 \
The use o he Noss technology enables a number of novel and commercially valuable business models for Internet mapping. These models take as their point of departure the proven success of businesses like Yahoo! Maps and MapQuest, both of which generate revenue from geographical advertising. Our approach, however, goes well beyond advertising, capitalizing on the ability of new technology to add substantial value for both businesses and end users. The essential idea is to allow businesses and people to rent "real estate" on the map, normally at their physical address, in which they can embed zoomable content. This content can appear in an iconic form — i.e. golden arches for McDonalds — when viewed on a large-scale map, but then smoothly and continuously resolve into any kind of web-like content when viewed closely. By integrating our mapping application with dynamic user content and business/residential address data, we can thus enable a kind of "geographical world wide web".
In addition to large horizontal markets for both consumers and retailers, synergies with existing geographical information service (GIS) providers and related industries would obtain, such as car navigation systems, cell phones and PDAs, real estate rentals or sales, classified advertising, and many more. Possible business relationships in these areas include technology licensing, strategic partnership, and direct vertical sales.
The capabilities ofthe new navigation techniques ofthe present invention are described in detail in the aforementioned U.S. patent application. For this application, the most relevant aspects ofthe base technology are: - smooth zooming and panning through a 2D world with perceptual continuity and advanced bandwidth management; - an infinite-precision coordinate system, allowing visual content to be nested without limit; - the ability to nest content stored on many different servers, so that spatial containment is equivalent to a hyperlink.
With respect to the latter two elements, additional details may be found in U.S. Provisional patent application No.: 60/474,313, filed, May 30, 2003, entitled SYSTEM AND METHOD FOR INFINITE PRECISION COORDINATES IN A ZOOMING USER INTERFACE, the entire disclosure of which is hereby incoφorated by reference.
A map consists of many layers of information; ultimately, the Voss map application will allow the user to turn most of these layers on and off, making the map highly customizable. Layers include: 1. roads; 2. waterways; 3. administrative boundaries; 4. aerial photography-based orthoimagery (aerial photography which has been digitally "unwarped" such that it tiles a map perfectly); 5. topography; 6. public infrastructure locations, e.g. schools, churches, public telephones, restrooms;
103 » j 7. labels for each ofthe above; 8. cloud cover, precipitation and other weather conditions; 9. traffic conditions; 10. advertising; and 11. personal and commercial user content; etc.
The most salient layers from the typical user's point of view are 1-4 and 7. The advertising/user content layers 10-11, which are of particular interest in this patent application are also of significant interest. Many ofthe map layers — including 1-7 — are already available, at high quality and negligible cost, from the U.S. Federal Government. Value-added layers like 8-9 (and others) can be made available at any time during development or even after deployment.
Several companies offer enhanced road data with annotations indicating one-way streets, entrance and exit ramps on highways, and other features important for generating driving directions but not for visual geography. The most relevant commercial geographic information service (GIS) offering for our application is geocoding, which enables the conversion of a street address into precise latitude/longitude coordinates. It has been determined that obtaining geocoding services will not be prohibitive.
In addition to map data, national Yellow Pages/White Pages data may also be valuable in implementing the present invention. This information may also be licensed. National Yellow Pages/White Pages data may be used in combination with geocoding to allow geographical user searches for businesses, or filtering (e.g. "highlight all restaurants in Manhattan"). Perhaps most importantly, directory listings combined with geocoding will greatly simplify associating business and personal users with geographic locations, allowing "real estate" to be rented or assigned via an online transaction and avoiding the need for a large sales force.
National telephone and address databases designed for telemarketing can be obtained inexpensively on CDs, but these are not necessarily of high quality — their coverage is usually only partial, and they are often out of date. A number of companies offer robust directory servers with APIs designed for software-oriented businesses like ours. Among the best is W3Data (www.w3data.com). which offers what it calls "neartime" national telephone listings using an XML-based API for $500/month minimum, beginning at $0.10/hit and going down to $0.05/hit for volumes of 250,000/month, or $0.03/hit for volumes over 1,000,000/month. The entire U.S. and Canada are covered. Reverse queries are also possible, i.e. looking up a name given a telephone number. The "neartime" data are updated at least every 90 days. Combined with 90-day caching of entries already obtained on our end, this is a very economical way to obtain high-quality national listings. "Realtime" data, updated nightly, are also available, but are more expensive ($0.20/hit). The realtime data are identical to those used by 411 operators.
Service providers similar to W3Data who can produce national business listings by category also exist, with comparable business and pricing models.
104 Classic advertising-based business models, as well as "media player" models involving a proprietary data format and a downloadable plug-in (such as Flash, Adobe Acrobat, and Real Player) normally face a chicken-and-egg problem. An advertising venue only becomes worth advertising in when people are already looking; a plug-in (even if free) only becomes worth downloading when there is already useful content to view; and content only becomes attractive to invest in making if there is already an installed user base ready to view it.
Although the Voss mapping application requires both downloadable client software and generates revenue through advertising, it will not suffer the disadvantages of classic advertising-based business models. Even before any substantial commercial space has been "rented", the present invention will provide a useful and visually compelling way of viewing maps and searching for addresses — that is, similar functionality to that of existing mapping applications, but with a greatly improved visual interface. Furthermore, the approach ofthe present invention provides limited but valuable service to non-commercial users free of charge to attract a user base. The limited service consists of hosting a small amount (5-15 MB) of server space per user, at the user's geographical location — typically a house. The client software may include simple authoring capabilities, allowing users to drag and drop images and text into their "physical address", which can then be viewed by any other authorized user with the client software. (Password protection may be available.) Because the zooming user interface approach is of obvious benefit for navigating digital photo collections — especially over limited bandwidth — the photo album sharing potential alone may attract substantial numbers of users. Additional server space may be available for a modest yearly fee. This very horizontal market is likely to be a major source of revenue.
The usual chicken-and-egg problem is thus avoided by providing valuable service from the outset, an approach that has worked well for search engines and other useful (and now profitable) web services. This contrasts sharply with online dating services, for example, which are not useful until the user base is built up.
In accordance with various aspects ofthe invention, the sources of revenue may include: 1. Commercial "rental" of space on the map corresponding to a physical address; 2. Fees for "plus services" (defined below) geared toward commercial users; 3. Fees for "plus services" geared toward non-commercial users; 4. Professional zoomable content authoring software; 5. Licensing or partnerships with PDA, cell phone, car navigation system, etc. vendors and service providers; 6. Information.
Basic commercial rental of space on a map can be priced using a combination ofthe following variables: 1. Number of sites on the map; 2. Map area ("footprint") per site, in square meters;
105 3. Desirability ofthe real estate, based on aggregate viewing statistics; 4. Server space needed to host content, in MB . Plus services for commercial users are geared toward franchises, businesses wishing to conduct e-commerce or make other more sophisticated use ofthe web, and businesses wishing to increase their advertising visibility: 1. Greater visible height — several levels may be offered, allowing visible but unobtrusive icons or "flags" to indicate the locations of business sites from farther away (more zoomed out) than they would otherwise be visible. 2. Focusing priority — Voss brings areas ofthe image into focus during navigation as data become available. By default, all visual content is treated equally, and focusing goes from the center ofthe screen outward. Focusing priority allows commercial content to come into focus faster than it would otherwise, increasing its prorninence in the user's "peripheral vision". This feature will be tuned to deliver commercial value without compromising the user's navigation experience. 3. Including a conventional web hyperlink in the zoomable content — these may be clearly marked (e.g., with the conventional underlined blue text) and, on the user's click, open a web browser. We can either charge for including such a hyperlink, or, like Google, charge per click. 4. Making the geographic area rented refer to an outside commercial server, which will itself host zoomable content of any type and size — this is a fancier version of #3, and allows any kind of e-business to be conducted via the map. We can once again either charge a flat fee or charge per user connecting to the outside server (though this no longer requires a click, just a zoom-in). 5. Billboards — as in real life, many high- visibility areas of the map will have substantial empty space. Companies can buy this space and insert content, including hyperlinks and "hyperjumps", which if clicked will make the user jump through space to a commercial site elsewhere on the map. In contrast to ordinary commercial space, billboard space need not be rented at a fixed location; its location can be generated on the fly during user navigation.
This last service raises issues ofthe "ecology" or visual aesthetics and usability ofthe map. It is desirable that the map is attractive and remains a genuine service for users, which implies limits on advertising and tasteful "zoning regulations". If the map becomes too cluttered with billboards or other content which does not mirror real world geography, then users will be turned off and the value ofthe map as an advertising and e- commerce venue will go down.
As we usage statistics are gathered, the value of many of these commercial "plus services" may be quantitatively demonstrable. Quantitative evidence of a competitive advantage should increase sales of these extras.
Plus services geared toward non-commercial users will consist of some ofthe same products, but scaled, priced and marketed differently.
Limited authoring of zoomable Voss content will be possible from within the free client. This will include inserting text, dragging and dropping digital photos, and setting
106 passwords. Professional authoring software may be a modified version ofthe client designed to allow more flexible zoomable content creation, as well as facilities for making hyperlinks and hyperjumps, and inserting custom applets.
Use ofthe present invention may generate a great deal of aggregate and individual information on spatial attention density, navigation routes and other patterns. These data are of commercial value.

Claims

CLAIMS :
1. A method, comprising: zooming into or out of an image having at least one object, wherein at least some elements of the at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming.
2. The method of claim 1, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' • za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units, z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1.
3. The method of claim 2, wherein at least one of d' and a may vary for one or more elements of the object.
4. The method of claim 2, wherein the power law is -1 < a < 0 within a range of zoom levels zO and zl, where zO is of a lower physical linear size/pixel than zl.
5. The method of claim 4, wherein at least one of zO, zl, d' and a may vary for one or more elements of the object.
6. The method of claim 1, wherein at least some elements of the at least one object are also scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming.
7. The method of claim 6, wherein the physically proportional scaling may be expressed by the following formula: p = c • d/z, where p is a linear size in pixels of one or more elements of the object, c is a constant, d is a real or imputed
108 linear size in physical units of the. one r „ m&xa. ?i'eτtιents of the object, and z is the zoom level in physical linear size/pixel.
8. The method of claim 6, wherein: the elements of the object are of varying degrees of coarseness; and the scaling of the elements at a given zoom level are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level.
9. The method of claim 8, wherein: the object is a roadmap, the elements of the object are roads, and the varying degrees of coarseness are road hierarchies; and the scaling of a given road at a given zoom level is physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level.
10. A storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, comprising: zooming into or out of an image having at least one object, wherein at least some elements of the at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming.
11. The storage medium of claim 10, wherein the non-physically proportional scaling- may be expressed by the 'following formula: p = d' za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units, z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1 .
12. The method of claim 11, wherein at least one of d' and a may vary for one or more elements' of the object.
109
13. The storage medium of claim 11, wherein the scale power is -1 < a < 0 within a range of zoom levels between zO and zl, where zO is of a lower physical linear size/pixel than zl.
14. The storage medium of claim 13, wherein at least one of zO and zl may vary for one or more elements of the object.
15. The storage medium of claim 9, wherein at least some elements of the at least one object are also scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming.
16. The storage medium of claim 15, wherein the physically proportional scaling may be expressed by the following formula: p == c • d/z, where p is a linear size in pixels of one or more elements of the object, c is a constant, d is a real or imputed linear size in physical units of the one or more elements of the object, and z is the zoom level in physical linear size/pixel.
17. The storage medium of claim 15, wherein: the elements of the object are of varying degrees of coarseness; and the scaling of the elements at a given zoom level are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level.
18. The storage medium of claim 17, wherein: the object is a roadmap, the elements of the object are roads, and the varying degrees of coarseness are road hierarchies; and the scaling of a given road at a given zoom level is physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level.
19. An apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, comprising: zooming into or out of an image having at least one obj ect, wherein at least some elements of the at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming.
20. The apparatus of claim 19, wherein the non-physically proportional scaling may be expressed by the following formula : p = d' • z*, where p is a linear size in pixels of one or more elements of the object at the zoom level , d' is an imputed linear size of the one or more elements of the obj ect in physical units , z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ — 1.
21. The method of claim 20, wherein at least one of d' and a may vary for one or more elements of the object.
22. The apparatus of claim 20, wherein the power law is -1 < a < 0 within a range of zoom levels zO and zl, where zO is of a lower physical linear size/pixel than zl.
23. The apparatus off claim 22, wherein at least one of zO and zl may vary for one or more elements of the object.
24. The apparatus of claim 19, wherein at least some elements of the at least one object are also scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming.
25. The apparatus of claim 24, wherein the physically proportional scaling may be expressed by the following formula: p = c d/z, where p is a linear size in pixels of one or more elements of the object, c is a constant, d is a real or imputed
111
,,, ,„^f p"?,r r? r ue r. /Pi t! - V » IVA *- linear size in physical units of tfea one or more elements of the object, and z is the zoom level, in physical linear size/pixel.
26. The apparatus of claim 24, wherein: the elements of the object are of varying degrees of coarseness; and the scaling of the elements at a given zoom level are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level.
27. The apparatus of claim 26, wherein: the object is a roadmap, the elements of the object are roads, and the varying degrees of coarseness are road hierarchies; and the scaling of a given road at a given zoom level is physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level.
28. A method, comprising: preparing a plurality of images of different zoom levels of at least one object, wherein at least some elements of the at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels.
29. The method of claim 28, wherein the images are pre-rendered at a source terminal for delivery to a client terminal .
30. The method of claim 28, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' z", where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units, z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1.
\ 31. The method of claim 30, wherein at least one of d' and a may vary for one or more elements of the object.
32. The method of claim 30, wherein the power law is -1 < a < 0 within a range of zoom levels between zO and zl, where zO is of a lower physical linear size/pixel than zl.
33. The method of claim 32, wherein at least one of zO and zl may vary for one or more elements of the object.
34. The method of claim 28, wherein at least some elements of the at least one object are also scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming.
35. The method of claim 34, wherein the physically proportional scaling may be expressed by the following formula: p = c • /z, where p is a linear size in pixels of one or more elements of the object, c is a constant, d is a real or imputed linear size in physical units of the one or more elements of the object, and z is the zoom level in physical linear size/pixel.
36. The method of claim 34, wherein: the elements of the object are of varying degrees of coarseness; and the scaling of the elements at a given zoom level are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level.
37. The method of claim 36, wherein: the object is a roadmap, the elements of the object are roads, and the varying degrees of coarseness are road hierarchies; and the scaling of a given road at a given zoom level is physically proportional or non-physically proportional based on:
(i) the road hierarchy of the given road; and (ii) the zoom level.
38. The method of claim 57, wxterein the power law is -1 < a < 0 within a range of zoom levels between zO and zl, where aO is of a lower physical linear size/pixel than zl.
39. The method of claim 38, wherein at least one of zO and zl may vary for one or more of the roads of the roadmap.
40. A method, comprising: receiving- at a client terminal a plurality of pre-rendered images of varying zoom levels of a roadmap; receiving one or more user navigation commands including zooming information at the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.
41. The method of claim 40, wherein at least some roads of the roadmap are scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling is at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level.
42. The method of claim 41, wherein the physically proportional scaling may be expressed by the following formula: p = c • d/z, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a real or imputed, linear size of the one or more elements of the object in physical units, and z is the zoom level in units of physical linear size/pixel.
43. The method of claim 41, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' • za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear
114 size of the one or more elements of the object in physical units, z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1.
44. The method of claim 43, wherein at least one of d' and a may vary for one or more elements of the object. -*
45. The method of claim 43, wherein the power law is -1 < a < 0 within a range of zoom levels between zO and zl, where zO is of a lower physical linear size/pixel than zl.
46. The method of claim 45, wherein at least one of zO and zl may vary for one or more roads of the roadmap.
47. The method of claim 40, wherein: the roads of the roadmap are of varying degrees of coarseness; and the scaling of the roads in a given pre-rendered image are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such roads; and (ii) the zoom level of the given pre-rendered image.
48. A method, comprising: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of at least one object, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving one or more user navigation commands including zooming information at the client terminal; blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal.
49. The method of claim 48, wherein the blending step includes performing at least one of alpha-blending, trilinear interpolation, and bicubic-linear interpolation.
50. The method of claim 48, wherein the number of pre-rendered images are such that blending therebetween provides the appearance of smooth navigation.
51. The method of claim 48, wherein the zoom levels and the scaling of the pre-rendered images are selected such that respective linear sizes in pixels p of a given one or more of the elements of the object do not vary by more than a predetermined number of pixels as between one pre-rendered image and another pre-rendered image of higher resolution.
52. The method of claim 51, wherein the predetermined number of pixels is about two.
53. The method of claim 50, further comprising downsampling a lowest resolution one of the pre-rendered images to facilitate navigation to zoom levels beyond a zoom level of the lowest resolution one of the pre-rendered images.
54. The method of claim 48, wherein the physically proportional scaling may be expressed by the following formula: p = c d/z, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a real or imputed linear size of the one or more elements of the object in physical units, and z is the zoom level in units of physical linear size/pixel.
55. The method of claim 48, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear siize of the one or more elements of the object in physical units, z is the zcom level in units or physical, linea c size/pixel, and a is a power law where a ≠ -1.
56. The method of claim 55, wherein at least one of d' and a may vary for one or more elements of the object.
57. The method of claim 55, wherein the power law is -1 < a < 0 within a range of zoom levels between zO and zl, where zO is of a lower physical linear size/pixel than zl.
58. The method of claim 57, wherein at least one of zO and zl may vary for one or more elements of the object.
59. The method of claim 48, wherein the plurality of pre-rendered images are received by the client terminal over a packetized network.
60. The method of claim 59, wherein the packetized network is the Internet.
61. The method of claim 48, wherein: the elements of the object are of varying degrees of coarseness; and the scaling of the elements in a given pre-rendered image are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level of the given pre-rendered image.
62. The method of claim 61, wherein: the object is a roadmap, the elements of the object are roads, and the varying degrees of coarseness are road hierarchies; and the scaling of a given road in a given pre-rendered image is physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level of the given pre-rendered image.
117
5 63. The method of claim 62, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' • za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units, > 10 and z is the zoom level in units of physical linear size/pixel.
64. The method of claim 63, wherein at least one of d' and a may vary for one or more elements of the object.
15 65. The method of claim 63, wherein the power law is -1 < a < 0 within a range of zoom levels between zO and zl, where zO is of a lower physical linear size/pixel than zl.
66. The method of claim 65, wherein at least one of zO and 20 zl may vary for one or more of the roads of the roadmap.
67. A method, comprising: transmitting a plurality of images of varying zoom levels of at least one object to a terminal over a communications channel, 25 at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; 30 receiving the plurality of images at the terminal; issuing one or more user navigation commands including zooming information using the terminal; blending at least two of the images to obtain an intermediate image of an intermediate zoom level that corresponds with the 35 zooming information of the navigation commands; and displaying the intermediate image on the terminal.
68. The method of claim 67, wherein the blending step includes performing at least one of alpha-blending, trilinear 40 interpolation, and bicubic-linear interpolation.
69 . The method of σ£aim 67, wherein the numfo,κjχ. of: images is such that blending therebetween provides the appearance of smooth navigation.
70. The method of claim 67, wherein the zoom levels and the scaling of the pre-rendered images are selected such that respective linear sizes in pixels p of a given one or more of the elements of the object do not vary by more than a predetermined number of pixels between one pre-rendered image and another pre- rendered image of higher resolution.
71. The method of claim 70, wherein the predetermined number of pixels is about two.
72. The method of claim 69, further comprising downsampling a lowest resolution one of the images to facilitate navigation to zoom levels beyond a zoom level of the lowest resolution one of the images .
73. The method of claim 69, wherein the physically proportional scaling may be expressed by the following formula: p = c • d/z, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a real or imputed linear size of the one or more elements of the object in physical units, and z is the zoom level in units of physical linear size/pixel.
74. The method of claim 69, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units, z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1.
75. The method of claim 74, wherein at least one of d' and a may vary for one or more elements of the object.
76. The method of claim 74, wheiein the power law is -1 < a < 0 within a range of zoom levels zO and zl, where zO is of a lower physical linear size/pixel than zl.
77. The method of claim 76, wherein at least one of zO and zl may vary for one or more elements of the object.
78. The method of claim 69, wherein the plurality of images are received by the terminal over a packetized network.
79. The method of claim 78, wherein the packetized network is the Internet.
80. The method of claim 69, wherein: the elements of the object are of varying degrees of coarseness; and the scaling of the elements in a given image are physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level of the given pre-rendered image.
81. The method of claim 80, wherein: the object is a roadmap, the elements of the object are roads, and the varying degrees of coarseness are road hierarchies; and the scaling of a given road in a given is physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level of the given pre-rendered image.
82. The method of claim 81, wherein the non-physically proportional scaling may be expressed by the following formula: p = d' za, where p is a linear size in pixels of one or more elements of the object at the zoom level, d' is an imputed linear size of the one or more elements of the object in physical units,
120 iUI l i i i li eT (RULE-2β; z is the zoom level in units of physical linear size/pixel, and a is a power law where a ≠ -1.
83. The method of claim 82, wherein at least one of d' and a may vary for one or more elements of the object.
84. The method of claim 82, wherein the scale power is -1 < a < 0 within a range of zoom levels between zO and zl, where zO is of a lower physical linear size/pixel than zl.
85. The method of claim 84, wherein at least one of zO and zl may vary for one or more of the roads of the roadmap.
121 T TE SHEET RULE 26)
PCT/US2005/008812 2004-03-17 2005-03-16 Methods and apparatus for navigating an image WO2005089403A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA2558833A CA2558833C (en) 2004-03-17 2005-03-16 Methods and apparatus for navigating an image
JP2007504079A JP4861978B2 (en) 2004-03-17 2005-03-16 Method and apparatus for navigating images
EP05740967A EP1759354A2 (en) 2004-03-17 2005-03-16 Methods and apparatus for navigating an image

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US55380304P 2004-03-17 2004-03-17
US10/803,010 US7133054B2 (en) 2004-03-17 2004-03-17 Methods and apparatus for navigating an image
US10/803,010 2004-03-17
US60/553,803 2004-03-17

Publications (2)

Publication Number Publication Date
WO2005089403A2 true WO2005089403A2 (en) 2005-09-29
WO2005089403A3 WO2005089403A3 (en) 2009-02-26

Family

ID=34994319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/008812 WO2005089403A2 (en) 2004-03-17 2005-03-16 Methods and apparatus for navigating an image

Country Status (4)

Country Link
EP (1) EP1759354A2 (en)
JP (1) JP4861978B2 (en)
CA (1) CA2558833C (en)
WO (1) WO2005089403A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008059582A (en) * 2006-08-29 2008-03-13 Samsung Electronics Co Ltd Level of detail value calculating method for reducing power consumption, and 3-dimensional rendering system using the same
WO2010043959A1 (en) * 2008-10-15 2010-04-22 Nokia Corporation Method and apparatus for generating an image
EP2146861B1 (en) * 2007-04-17 2012-12-26 Volkswagen Aktiengesellschaft Display device for a vehicle for the display of information relating to the operation of the vehicle and method for the display of the information thereof
US8935292B2 (en) 2008-10-15 2015-01-13 Nokia Corporation Method and apparatus for providing a media object
EP2556490A4 (en) * 2010-04-05 2017-06-28 Microsoft Technology Licensing, LLC Generation of multi-resolution image pyramids

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332079B (en) * 2011-09-16 2013-12-04 南京师范大学 GIS (geographic information system) vector data disguising and restoring method based on error random interference

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030142099A1 (en) * 2002-01-30 2003-07-31 Deering Michael F. Graphics system configured to switch between multiple sample buffer contexts
US20030156738A1 (en) * 2002-01-02 2003-08-21 Gerson Jonas Elliott Designing tread with fractal characteristics
US20030231190A1 (en) * 2002-03-15 2003-12-18 Bjorn Jawerth Methods and systems for downloading and viewing maps

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000046566A (en) * 1998-07-29 2000-02-18 Aisin Aw Co Ltd Map display device and storage medium
JP2002245473A (en) * 2001-02-16 2002-08-30 Hitachi Eng Co Ltd Method and device for map display
DE10226885A1 (en) * 2002-06-17 2004-01-08 Herman/Becker Automotive Systems (Xsys Division) Gmbh Method and driver information system for displaying a selected map section

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030156738A1 (en) * 2002-01-02 2003-08-21 Gerson Jonas Elliott Designing tread with fractal characteristics
US20030142099A1 (en) * 2002-01-30 2003-07-31 Deering Michael F. Graphics system configured to switch between multiple sample buffer contexts
US20030231190A1 (en) * 2002-03-15 2003-12-18 Bjorn Jawerth Methods and systems for downloading and viewing maps

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008059582A (en) * 2006-08-29 2008-03-13 Samsung Electronics Co Ltd Level of detail value calculating method for reducing power consumption, and 3-dimensional rendering system using the same
EP2146861B1 (en) * 2007-04-17 2012-12-26 Volkswagen Aktiengesellschaft Display device for a vehicle for the display of information relating to the operation of the vehicle and method for the display of the information thereof
WO2010043959A1 (en) * 2008-10-15 2010-04-22 Nokia Corporation Method and apparatus for generating an image
CN102187369A (en) * 2008-10-15 2011-09-14 诺基亚公司 Method and apparatus for generating an image
US8935292B2 (en) 2008-10-15 2015-01-13 Nokia Corporation Method and apparatus for providing a media object
US9218682B2 (en) 2008-10-15 2015-12-22 Nokia Technologies Oy Method and apparatus for generating an image
US9495422B2 (en) 2008-10-15 2016-11-15 Nokia Technologies Oy Method and apparatus for providing a media object
US10445916B2 (en) 2008-10-15 2019-10-15 Nokia Technologies Oy Method and apparatus for generating an image
EP2556490A4 (en) * 2010-04-05 2017-06-28 Microsoft Technology Licensing, LLC Generation of multi-resolution image pyramids

Also Published As

Publication number Publication date
CA2558833C (en) 2014-12-30
JP2008501160A (en) 2008-01-17
EP1759354A2 (en) 2007-03-07
JP4861978B2 (en) 2012-01-25
CA2558833A1 (en) 2005-09-29
WO2005089403A3 (en) 2009-02-26

Similar Documents

Publication Publication Date Title
CA2812008C (en) Methods and apparatus for navigating an image
AU2006230233B2 (en) System and method for transferring web page data
JP4831071B2 (en) System and method for managing communication and / or storage of image data
WO2005089434A2 (en) Method for encoding and serving geospatial or other vector data as images
US7023456B2 (en) Method of handling context during scaling with a display
US7075535B2 (en) System and method for exact rendering in a zooming user interface
JP4410465B2 (en) Display method using elastic display space
US7287220B2 (en) Methods and systems for displaying media in a scaled manner and/or orientation
US6674445B1 (en) Generalized, differentially encoded, indexed raster vector data and schema for maps on a personal digital assistant
CN101501664A (en) System and method for transferring web page data
US20070064018A1 (en) Detail-in-context lenses for online maps
WO2008054805A2 (en) Method of client side map rendering with tiled vector data
CA2558833C (en) Methods and apparatus for navigating an image
Möser et al. Context aware terrain visualization for wayfinding and navigation
JP2008535098A (en) System and method for transferring web page data
US20090172570A1 (en) Multiscaled trade cards
KR20030015765A (en) Method and system for providing panorama-typed images on the internet
Perlin et al. Live Paint
CA2425990A1 (en) Elastic presentation space

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 2558833

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2005740967

Country of ref document: EP

NENP Non-entry into the national phase in:

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007504079

Country of ref document: JP

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWP Wipo information: published in national office

Ref document number: 2005740967

Country of ref document: EP