EP1756521A2 - Procede de codage et de service de donnees geospatiales ou autres donnees vectorielles sous forme d'images - Google Patents
Procede de codage et de service de donnees geospatiales ou autres donnees vectorielles sous forme d'imagesInfo
- Publication number
- EP1756521A2 EP1756521A2 EP05725818A EP05725818A EP1756521A2 EP 1756521 A2 EP1756521 A2 EP 1756521A2 EP 05725818 A EP05725818 A EP 05725818A EP 05725818 A EP05725818 A EP 05725818A EP 1756521 A2 EP1756521 A2 EP 1756521A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- layer
- image
- data
- location
- data block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3863—Structures of map data
- G01C21/3867—Geometry of map features, e.g. shape points, polygons or for simplified maps
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3863—Structures of map data
- G01C21/387—Organisation of map data, e.g. version management or database structures
- G01C21/3878—Hierarchical structures, e.g. layering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
- G06V30/422—Technical drawings; Geographical maps
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B29/00—Maps; Plans; Charts; Diagrams, e.g. route diagram
- G09B29/10—Map spot or coordinate position indicators; Map reading aids
- G09B29/106—Map spot or coordinate position indicators; Map reading aids using electronic means
Definitions
- One or more embodiments of the present invention relate to an extension of these selectively de-compressible image compression and transmission technologies to geospatial or schematic data.
- the one or more embodiments combine and extend methods described in the following documents, which are included in an appendix of this specification: (1) "Method for Spatially Encoding Large Texts, Metadata, and Other Coherently Accessed Non-Image Data"; (2) “Methods And Apparatus For Navigating An Image”; (3) "System and Method For The
- the invention provides a method of transmitting information indicative of an image comprising transmitting one or more nodes of information as a first image, transmitting a second image including information indicative of vectors defining characteristics to be utilized for display at predetermined locations in the first image, and transmitting a third image comprising a mapping between the first and second images such that a receiver of the first and second images can correlate the first and second images to utilize the vectors at the predetermined locations.
- the first image is a map and the second image is a set of vectors defining visual data that is only displayed at predetermined levels of detail.
- the first image is a map.
- the second image includes hyperlinks.
- the first image is a map
- the second image includes a set of vectors and wherein plural ones of the vectors are located at locations corresponding to locations on the first image wherein the vectors are to be applied, and plural ones of the vectors are located at locations on the second image which do not correspond to the locations on the first image wherein the vectors are to be applied.
- the method further comprises utilizing an efficient packing algorithm to construct the second image to decrease an amount of space between a location on the second image at which one or more vectors appear, and a location on the first image where the one or more vectors are to be applied.
- the vectors include information to launch a node or sub-node.
- the invention provides a method of rendering an image comprising receiving a first, second, and third set of data from a remote computer, the first data set being representative of an image, the second being representative of vectors defining characteristics of the image at prescribed locations, and the third serving to prescribe the locations.
- the prescribed locations are street locations on a map.
- the vectors represent sub-nodes and include information indicative of under what conditions the sub-nodes should launch.
- the vectors include hyperlinlcs to at least one of the group consisting of: external content, such as advertising materials, and/or embedded visual content.
- the vectors include hyperlinlcs to advertising materials.
- the vectors include information specifying a rendering method for portions of an image at predetermined locations in the image.
- the invention provides a method, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; and providing a second layer of the image, the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer substantially corresponding to a location in the first layer of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of the first layer.
- each data block describes at least one characteristic of the feature corresponding to each data block.
- the method further comprises providing a third layer of the image, the third layer including pointers, each pointer corresponding to a respective one of the features and a respective one of the data blocks.
- each pointer indicates the location of each pointer's corresponding data block with respect to each pointer's location.
- the describing comprises providing text data for at least one feature.
- the describing comprises providing a graphical illustration of at least one feature.
- the describing comprises providing geometric data indicative of at least one feature.
- the describing comprises providing two- dimensional or three-dimensional shape or contour information for at least one feature.
- the describing comprises providing color information for at least one feature.
- the describing comprises providing advertising or hyperlinking information relating to at least one feature.
- the describing comprises providing at least one link to an external web site relating to at least one feature.
- the describing comprises providing embedded visual content relating to at least one feature.
- the describing comprises providing advertising information relating to at least one feature.
- the describing comprises: providing schematic detail of a road segment.
- the describing comprises: providing schematic detail for at least one of the group consisting of: at least one road, at least one park, a topography of a region, a hydrography of a body of water, at least one building, at least one public restroom, at least one wireless fidelity station, at least one power line, and at least one stadium.
- the invention provides an apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, including: providing a first layer of an image, the first layer including features of the image having locations within the first layer; and providing a second layer of the image, the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer substantially corresponding to a location in the first layer of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of the first layer.
- the invention provides a storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, including: providing a first layer of an image, the first layer including features of the image having locations within the first layer; and providing a second layer of the image, the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer substantially corresponding to a location in the first layer of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of the first layer.
- the invention provides a method, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; providing a second layer of the image, the second layer including data blocks corresponding to and describing respective ones of the features, each data block being in a location in the second layer at least substantially corresponding to a location in the first layer of the feature corresponding to each data block; and providing a third layer of the image, the third layer including pointers having locations in the third layer, each pointer corresponding to a respective one of the features and a respective one of the data blocks, the location of each pointer in the third layer at least substantially corresponding to the location in the first layer of the feature corresponding to each pointer.
- the second layer and the third layer each have a size and shape corresponding to a size and a shape of the first layer.
- the method further comprises: forming a map image from a combination of the first layer, the second layer, and the third layer.
- the method further comprises: flattening data in the map image.
- each pointer indicates the location of each pointer's corresponding data block with respect to each pointer's location.
- the indicating comprises identifying an offset in two dimensions.
- each dimension of the offset is expressed in units corresponding to an integral number of pixels, e.g. 2 or 4.
- the indicating comprises identifying an offset as a one-dimensional distance along a Hubert curve.
- the offset along the one- dimensional curve is expressed in units of pixels.
- the offset along the one- dimensional curve is expressed in units corresponding to an integral number of pixels.
- the offset along the one-dimensional curve is expressed in units corresponding to integral multiples of pixels.
- placing each data block comprises: locating each data block employing a packing algorithm to achieve a maximum proximity of each data block to a target location for each data block in the second layer, the target location in the second layer corresponding to the location in the first layer of the feature corresponding to each data block.
- the packing algorithm ensures that no two data blocks in the second layer overlap each other.
- the maximum proximity is determined based on a shortest straight-line distance between each data block's location and the target location for each data block.
- the maximum proximity is determined based on a sum of absolute values of offsets in each of two dimensions between each data block's location and the target location for each data block.
- the maximum proximity is determined based on a minimum Hubert curve length between each data block's location and the target location for each data block.
- the invention provides a storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; providing a second layer of the image, the second layer including data blocks corresponding to and describing respective ones of the features, each data block being in a location in the second layer at least substantially corresponding to a location in the first layer of the feature corresponding to each data block; and providing a third layer of the image, the third layer including pointers having locations in the third layer, each pointer corresponding to a respective one of the features and a respective one of the data blocks, the location of each pointer in the third layer at least substantially corresponding to the location in the first layer of the feature corresponding to each pointer.
- the invention provides an apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; providing a second layer of the image, the second layer including data blocks corresponding to and describing respective ones of the features, each data block being in a location in the second layer at least substantially corresponding to a location in the first layer of the feature corresponding to each data block; and providing a third layer of the image, the third layer including pointers having locations in the third layer, each pointer corresponding to a respective one of the features and a respective one of the data blocks, the location of each pointer in the third layer at least substantially corresponding to the location in the first layer of the feature corresponding to each pointer.
- FIG. 1 illustrates a prerendered layer of a roadmap image including a plurality of features suitable for description in data blocks in accordance with one or more embodiments of the present invention
- FIG. 2 illustrates the roadmap of FIG. 1 and the pointers and data blocks corresponding to the respective road segments in a region having a low concentration of road segments in accordance with one or more embodiments of the present invention
- FIG. 3 illustrates a concentrated set of road segments belonging to a plurality of roads with a main road as well as pointers and data blocks corresponding to the road segments in a region having a high concentration of intersections in accordance with one or more embodiments of the present invention
- FIG. 4 illustrates test output of a greedy rectangle packing algorithm for three cases in accordance with one or more embodiments of the present invention
- FIG. 5A is an image of binary 8-bit data taken from a dense region of roadmap data image of the Virgin Islands before the flattening of such data in accordance with one or more embodiments of the present invention
- FIG. 5B is an image of binary 8-bit data taken from a dense region of roadmap data image of the Virgin Islands after the flattening of such data in accordance with one or more embodiments of the present invention
- FIG. 6 illustrates a first-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention
- FIG. 7 illustrates a second-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention
- FIG. 8 illustrates a third-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention
- FIG. 9 illustrates a fourth-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention
- FIG. 10 illustrates a fifth-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention
- FIG. 11 depicts an image of one of the U.S. Virgin Islands which incorporates 4-pixel by 4- pixel size data blocks for use in accordance with one or more embodiments of the present invention
- FIG. 12 depicts an image of one of the U.S. Virgin Islands which incorporates 6-pixel by 6- pixel size data blocks for use in accordance with one or more embodiments of the present invention.
- FIG. 13 depicts an image of one of the U.S. Virgin Islands which incorporates 8-pixel by 8- pixel size data blocks for use in accordance with one or more embodiments of the present invention.
- the various aspects of the present invention may be applied in contexts other than encoding and/or serving map data. Indeed, the extent of images and implementations for which the present invention may be employed are too numerous to list in their entirety.
- the features of the present invention may be used to encode and/or transmit images of the human anatomy, complex topographies, engineering diagrams such as wiring diagrams or blueprints, gene ontologies, etc. It has been found, however, that the invention has particular applicability to encoding and/or serving images in which the elements thereof are of varying levels of detail or coarseness. Therefore, for the purposes of brevity and clarity, the various aspects of the present invention will be discussed in connection with a specific example, namely, encoding and/or serving of images of a map.
- (2) the concept of continuous multi-scale roadmap rendering was introduced.
- the basis for one or more embodiments of the invention of (2) is a pre-rendered "stack" of images of a roadmap or other vector-based diagram at different resolutions, in which categories of visual elements (e.g. classes of roads, including national highway, state highway, and local road) are rendered with different visual weights at different resolutions.
- categories of visual elements e.g. classes of roads, including national highway, state highway, and local road
- Blending coefficients and a choice of image resolutions can be blended depending upon the zoom scale. The net result is that a user on the client side can navigate through a large map (e.g. all roads in the United States), zooming and panning continuously, without experiencing any visual discontinuities, such as categories of roads appearing or disappearing as the zoom scale is changed.
- a large map e.g. all roads in the United States
- the most relevant categories can be accentuated. For example, when zoomed out to view the entire country, the largest highways can be strongly weighted, making them stand out clearly, while at the state level, secondary highways can also weighted strongly enough to be clearly visible.
- all roads are clearly visible, and in the preferred embodiment for geospatial data, all elements are preferably shown at close to their physically correct scale.
- a maximum reasonable resolution for these most detailed pre-rendered images may be about 15 meters/pixel. However, it is desirable from the user's standpoint to be able to zoom in farther.
- Transverse Mercator zone image at 15 meters/pixel may already be in the gigapixel range); second, because a pre-rendered image is an inefficient representation for the kind of very sparse black-and-white data normally associated with high-resolution roadmap rendering; and third, because the client may require the "real" vector data for performing computational tasks beyond a static visual presentation.
- a route guidance system may highlight a road or change its color as displayed to a user on a monitor or in print media. This can be done on the client side only if the client has access to vector data, as opposed to a pre-rendered image alone.
- Vector data may also include street names, addresses, and other information which the client preferably has the flexibility to lay out and render selectively. Pre-rendering street name labels into the map image stack is undesirable, as these labels are preferably drawn in different places and are preferably provided with different sizes depending on the precise location and scale of the client view. Different label renditions should not blend into one another as the user zooms. Pre-rendering such data would also eliminate any flexibility with regard to font.
- vector data (where we use the term generically to refer both to geometric and other information, such as place names) is both beneficial to the client in its own right, and a more efficient representation of the information than pre-rendered imagery, when the desired rendering resolution is high.
- the complete vector data may become prohibitively large and complex, making the pre-rendered image the more efficient representation.
- some subset of the vector data is beneficial, such as the names of major highways. This subset of the vector data may be included in a low resolution data layer associated with the low resolution pre-rendered layer, with more detailed vector data available in data layers associated with higher resolution pre-rendered layers.
- One or more embodiments of the present invention extend the methods introduced in (1) to allow spatial vector data to be encoded and transmitted selectively and incrementally to the client, possibly in conjunction with the pre-rendered imagery of (2). In the prior art, this would be accomplished using a geospatial database. The database would need to include all relevant vector data, indexed spatially. Such databases present many implementation challenges. In one or more embodiments of the present invention, instead of using conventional databases, we use spatially addressable images, such as those supported by JPEG2000/JPIP, to encode and serve the vector data.
- the prerendered layer is a preferably pre-computed literal rendition of the roadmap, as per (2).
- the pointer layer preferably includes 2*2 pixel blocks which are preferably located in locations within the pointer layer that correspond closely, and sometimes identically, to the locations, within the pre-rendered layer, of the respective features that the pointers correspond to.
- the data layer preferably consists of n*m pixel blocks centered on or positioned near the 2*2 pointers which refer to them.
- the prerendered layer may also be in 24-bit color, or in any other color space or bit depth.
- the prerendered layer, the pointer layer, and the data layer are in essence two-dimensional memory spaces for storing various quantities of binary data.
- These three layers preferably correspond to a common two-dimensional image region which is the subject of a roadmap or other two-dimensional image representation to a client.
- the terms "size” and “shape” of a layer generally correspond to the size and shape, respectively, of the two-dimensional image which the data in that layer relates to.
- the prerendered layer, the pointer layer, and the data layer forming a particular map image for instance, have "sizes” and “shapes" in the two-dimensional image (that is formed from these three layers) that are at least very close to, or possibly identical to, one another.
- the stored data for the three layers are distributed within a physical memory of a data processing system.
- the pertinent "features" in the prerendered layer may be road segments. In a map having 10 road segments, pointer 1 in the pointer layer would correspond to road segment 1 in the prerendered layer and to data block 1 in the data layer.
- pointer 1 is preferably in a location within the pointer layer that corresponds closely, and perhaps identically, to the location of road segment 1 (or more generally "feature 1") within the prerendered layer.
- the size and shape of the three map layers preferably correspond closely to one another to make the desired associations of entries in the respective map layers as seamless as possible within a data processing system configured to access any of the layers and any of the entries in the layers, as needed. It will be appreciated that while the discussion herein is primarily directed to maps formed from three layers of data, the present invention could be practiced while using fewer or more than three layers of data, and all such variations are intended be within the scope of the present invention.
- the three map layers are preferably of equal size and in registration with each other, they can be overlaid in different colors (red, green, blue on a computer display, or cyan, magenta, yellow for print media) to produce a single color image.
- FIGS. 1-3 may be displayed in color (either on an electronic display or on print media), and may be stored on the server side as a single color JPEG2000. However, for the sake of simplicity, FIGS. 1-3 are presented in black and white in this application. Preferably, only the prerendered layer would actually be visible in this form on the client's display.
- FIG. 1 illustrates a prerendered layer of a roadmap including a plurality of features numbered 102 through 124.
- the features shown are all road segments. However, features may include many other entities such as sports arenas, parks, large buildings and so forth.
- the region shown in FIG. 1 is included for illustrative purposes and does not correspond to any real-world city or street layout.
- FIG. 2 illustrates the roadmap of FIG. 1 as well as the pointers and data blocks corresponding to the respective road segments in a region having a low concentration of road segments in accordance with one or more embodiments of the present invention.
- Road segment 102 is shown in FIG. 2 and the other road segments from FIG. 1 are reproduced in FIG. 2. However, due to space limitations, the reference numerals for the other eleven road segments (104-124) are not shown in FIG. 2.
- pointers are shown as dark grey blocks, and data blocks are shown as larger light grey blocks.
- FIG. 2 illustrates a region having a relatively low concentration of road segments per unit area
- pointers 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222 and 224
- data blocks (242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262 and 264) can be placed in locations within the third layer (data layer) of map 200 that correspond reasonably closely to the locations within the prerendered layer of map 200 of the respective features to which the data blocks correspond.
- FIG. 3 illustrates a concentrated set of road segments of a plurality of minor roads 320 and a smaller number of main roads 310 as well as pointers and data blocks corresponding to their respective road segments in a region having a high concentration of road segments in accordance with one or more embodiments of the present invention.
- Reference numeral 330 refers to all of the pointers, and reference numeral 340 refers to all of the data blocks.
- the concentration of features is too high to enable all of the pointers or all of the data blocks to be located in locations within their respective layers that correspond exactly to the locations of the features in layer one that they correspond to.
- the degree of offset for the pointer locations may be minor or significant depending upon the degree of crowding.
- the concentration of road segments in FIG. 3 precludes placing the data blocks in locations in layer three closely corresponding to the locations of the respective features within layer one that the data blocks correspond to.
- data blocks 340 are distributed as close as possible to their corresponding pointers, making use of a nearby empty area 350 which is beneficially devoid of features, thus allowing the data blocks 340 to overflow into empty area 350.
- Empty area 350 may be any type of area that does not have a significant concentration of features having associated pointers and data blocks, such as, for example, a body of water or a farm.
- a packing algorithm may be employed to efficiently place data blocks 340 within map 300. This type of algorithm is discussed later in this application, and the discussion thereof is therefore not repeated in this section.
- the client can request from the server the relevant portions of all three image layers, as shown.
- the prerendered layer is generally the only one of the three image layers that displays image components representing the physical layout of a geographical area.
- the other two image layers preferably specify pointers and data blocks corresponding to features in the prerendered layer.
- the pointer image consists of 2x2 pixel blocks aligned on a 2x2 pixel grid, each of which specifies an (x,y) vector offset (with the x and y components of the vector each comprising a 16-bit integer, hence two pixels each) from its own location to the beginning (top left corner) of a corresponding data block in the data layer.
- the corresponding data block in turn, can begin with two 16-bit values (four pixels) specifying the data block width and height.
- the width is specified first, and is constrained to have a magnitude of at least 2 pixels, hence avoiding ambiguities in reading the width and height.
- the remainder of the data block can be treated as binary data which may contain any combination of vectors, text, or other information.
- data blocks may contain street-map information including street names, address ranges, and vector representations. Compression:
- One existing solution involves sending requests to a spatial database for all relevant textual/vector information within a window of interest.
- the server replies with a certain volume of text.
- Existing spatial database systems send back the information substantially as plain text.
- wavelet compression can be applied, thereby enabling the server to satisfy the data request while sending a much smaller quantity of data than would existing systems.
- Areas with no data storage that are located between data-storage areas on the data and pointer layers create very little waste, as they would if the image were being transmitted in uncompressed in raster order, because these areas have zero complexity and can be compressed into a very small number of bits in the wavelet representation.
- Typical browsing patterns involve gradual zooming and panning.
- Multi-resolution streaming image browsing technologies are designed to work well in this context.
- Complete information can be transmitted for the window of interest, and partial information can be transmitted for areas immediately surrounding the window of interest.
- Upon panning or other movement preferably only the relevant new information is transmitted (the "delta"). All of this can be done in a very efficient manner.
- An especially large data block for example, may be partially transmitted well before the window of interest intersects the data block's anchor point.
- the pointer layer shows how distant a data block is from the pointer it corresponds to.
- the data blocks can be centered directly on the pointer positions. In this case, all data would be perfectly local. In urban areas, however, data begins to "crowd", and data blocks may be in positions offset from the respective pointers (in the pointer image) and the respective features (in the prerendered image) that the data blocks correspond to.
- an upper limit can be imposed on the maximum displacement of a data block from the feature to which it corresponds in the prerendered image.
- Crowding may be visible in the data layer and in the pointer layer.
- the appropriate resolutions for different classes of data may vary over space, so that for example small-street vector data might be encodable at 30 meters/pixel in rural areas but only at 12 meters/pixel in urban areas.
- the pointer and data images make data crowding easy to detect and correct, either visually or using data-processing algorithms.
- the resulting hierarchy of data images can help ensure high-performance vector browsing even in a low-bandwidth setting, since the amount of data needed for any given view can be controlled so as to not exceed an upper bound. This kind of upper bound is extremely difficult to enforce, or even to define, in existing geospatial databases.
- One or more aspects of the present invention concern mapping the geospatial database problem onto the remote image browsing problem.
- a great deal of engineering has gone into making remote image browsing work efficiently.
- this includes optimization of caching, bandwidth management, memory usage for both the client and the server, and file representation on the server side.
- This technology is mature and available in a number of implementations, as contrasted with conventional geospatial database technology.
- one or more embodiments of the present invention contemplate bringing about efficient cooperation between a suitably arranged geospatial database and remote image browsing technology that interacts with the geospatial database. Further, in one or more embodiments of the present invention, only a single system need then be used for both image and data browsing, with only a simple adapter for the data on the client side. The foregoing is preferable to having two quasi-independent complex systems, one for image browsing and another for data browsing.
- Hilbert curve sometimes also called the Hilbert-Peano curve.
- the Hilbert curve belongs to a family of recursively defined curves known as space-filling curves (see http://mathworld. wolfram.com /HilbertCurve.html, or for the original reference, Hilbert, D. "Uber die stetige Ab Struktur fürica auf ein Flachenst ⁇ ck.” Math. Ann. 38, 459-460, 1891, which is incorporated by reference herein.).
- Hilbert curves of order 1, 2, 3, 4 and 5 are illustrated in FIGS. 6, 7, 8, 9, and 10, respectively.
- the one-dimensional curve fills the entire unit square (formally, becomes dense in the unit square).
- the n th order curves visit 4 ⁇ n points on the unit square. For the first order case (for 4 ⁇ 1), these points are the corners of the square.
- bit manipulation there are known rapid algorithms for inter-converting between path length on the n th order Hilbert curve and (x,y) coordinates (see Warren, Henry S.
- the Hilbert curve is relevant to the problem of encoding the pointer image because it provides a convenient way of encoding a two-dimensional vector (having components x and y) as a single number d, while preserving the "neighborhood relationship" fairly well.
- the neighborhood relationship means that as the vector position changes slowly, d tends to also change slowly, since, generally, points whose "d" values are close together are also close in two-dimensional space. However, this relationship does not always hold. For example, in the order-2 case, when moving from (1,0) to (2,0), the path distance "d" goes from 1 to 14. It is not possible to fill 2D space with a ID curve and always preserve the neighborhood relationship.
- the value of every pixel in an 8-bit image can be taken to be a path distance on the 4th order Hilbert curve, thus encoding a vector in the range (0,0)-(15,15), i.e. anywhere on a 16*16 grid.
- the Hilbert coded pixels will have a value which is less than 16 if x and y are both less than 4, and a value less than 4 if x and y are both less than 2.
- the data packing algorithm preferably packs data blocks as close as possible to the anchor point (where the pointer will be inserted), vectors with small values are common. Moreover, if these vectors are Hilbert-coded, this will translate into small pixel values in the pointer image and hence better image compression performance.
- the value 256 equals 2 ⁇ 8
- the value 4096 equals 2 ⁇ 12.
- the Hilbert coding algorithm is modified to accommodate signed vectors, where x and y values range over both positive and negative numbers.
- the modification involves specifying two extra bits along with the path distance d, identifying the vector quadrant. These two bits are sign bits for x and y.
- the sign bits are assigned to the two lowest-order bit positions of the output value, so that the numerical ranges of coded vectors in each of the quadrants are approximately equal.
- vectors with x and y values between -128 and +127 inclusive can be coded using a 7th order Hilbert curve for each quadrant.
- the pointer and data layers are precomputed, just as the prerendered layer is.
- Precomputation for the pointer and data layers consists of encoding all of the relevant vector data into data blocks, and packing both the pointers and data blocks as efficiently as possible into their respective layers.
- features tend to be well separated, resulting in large empty areas in the pointer and data images. Where pointers do occur, they preferably fall precisely on the feature to which they refer, and their corresponding data blocks are in turn often centered precisely on the pointer.
- dense urban areas however (see FIG.
- a “greedy algorithm” is used to insert a new rectangle as close as possible to a target point and then proceeds as follows:
- this algorithm ultimately succeeds, in placing a rectangle provided that, somewhere in the image, an empty space exists which meets or exceeds the dimensions of the rectangle to be placed.
- This algorithm is "greedy” in the sense that it places a single rectangle at a time. The greedy algorithm does not attempt to solve the wholistic problem of packing n rectangles as efficiently as possible.
- a wholistic algorithm includes defining an explicit measure of packing efficiency, specifying the desired "tradeoff between minimizing wasted space and minimizing distance between rectangles and their "target points”.
- FIG. 4 demonstrates the output of the basic packing algorithm for three cases.
- the algorithm sequentially places a number of rectangles as near as possible to a common point.
- This solution to the rectangle packing problem is provided by "way of example.
- most of the rectangles are small and narrow.
- the center example of the three, large and at least substantially square rectangles are used.
- a mix of small and large rectangles is employed.
- pointer data is organized into two-pixel by two-pixel (meaning two pixels along a row and two pixels along a column) units.
- each pointer is preferably 2x2 (the notation being rows x columns).
- the row size and the column size of pointers may vary.
- pointers may be represented by a single 24-bit co> lor pixel, using 12 th order Hilbert coding.
- the data block the block area in square pixels is determined by the amount of data which will fit in the block, but this area can fit into rectangles of many different shapes.
- a 24-byte data block (including 4 bytes of width and height information, and 20 bytes of arbitrary data) can be represented exactly as 1x24, 2x12, 3x8, 4x6, 6x4, 8x3, or 12x2.
- the data block can also be represented, with one byte left over, within a 5-pixel by 5-pixel block (or "5x5").
- the specifications for a valid ceiling factorization are that its area meet or exceed the dimensions of the data block in question, and that no row or column be entirely wasted. For example, 7x4 or 3x9 are not preferred ceiling factorizations, as they can be reduced to 6x4 and 3x8, respectively.
- block dimensions may be selected based only on a ceiling factorization of the data length.
- "squarer" blocks such as 4x6) pack better than oblique ones (such as 2x12).
- the simplest data-block-sizing algorithm can thus select either 4x6 or 5x5, depending on how it trades off "squareness" against wasted bytes.
- More sophisticated block size selection algorithms may pick block dimensions adaptively, as part of the search for empty space near the target point.
- steps 1 and 4 of the algorithm above are then modified as follows: 1) sort the ceiling factorizations having the needed data length by desirability, with preference for squarer factorizations and possibly a penalty for wasted bytes.
- Each of the three map layers — the prerendered layer, the pointer layer, and the data layer — is preferably stored as a JPEG2000 or similar spatially-accessible representation. However, the permissible conditions for data compression are different for different ones of the three layers.
- Compression for the prerendered road layer need not be lossless, but it is beneficial for it to have reasonable perceptual accuracy when displayed.
- At 15m/pixel we have found 0.5 bit/pixel lossy wavelet compression to be fully adequate.
- the pointer and data layers are compressed losslessly, as they contain data which the client needs accurate reconstruction of. Lossless compression is not normally very efficient. Typical digital imagery, for example, is not usually compressible losslessly by more than a factor of about two at best. Techniques have been developed (as described in the "Flattening" section below) to achieve much higher lossless compression rates for the data and pointer layers, while still employing standard wavelet-based JPEG2000 compression.
- an "allocate" function is defined to allocate a given number of bytes (corresponding to pixels). This allocate function preferably differs from the analogous conventional memory allocation function (in C, "malloc") in three ways.
- the allocate function disclosed herein is passed not only a desired number of pixels to allocate, but also a target position on the data image.
- the desired number of pixels are allocated as close as possible to the target position, while avoiding any previously allocated pixels.
- Low data overhead One or more embodiments of the data image explored to date need some auxiliary data to be encoded. In the preliminary version, this data includes the block dimensions, stored as 16-bit values for width and height. Thus, the overhead was 4 pixels per allocated chunk of data.
- One or more embodiments of the present invention simplify data packing by setting the fundamental spatial unit of data allocation to be an n*m pixel block, where n and m are small but not less than 2*2, aligned on a grid with spacing n*m. These blocks may thus be considered "superpixels”.
- a single allocated chunk typically has more than n*m bytes, and the chunk must therefore span multiple blocks.
- blocks are preferably chained.
- the first two pixels of a block preferably comprise a pointer (which may be Hilbert-encoded, as described above), pointing to the next block in the chain.
- a pointer which may be Hilbert-encoded, as described above
- Vectors can be specified in grid units relative to the current block, so that for example, if a block specifies the vector (+1,0), it would mean that the chunk continues in the next block to the right; if a block specifies (-2,-1), it would mean that the chunk continues two blocks to the left and one block up.
- a (0,0) vector (equivalent to a null pointer) may be used to indicate that the current block is the last in the chain.
- Data overhead in this scheme may be high if the block size is very small.
- two of the four pixels per block serve as pointers to the next block, making the overhead data one half of the total data for the block.
- the chunk allocation algorithm works by allocating n*m blocks sequentially. For k bytes, ceil((n*m-2)/k) blocks can be allocated. Allocation of a block can consist of locating the nearest empty block to the target point and marking it as full. After the required number of blocks are allocated, data and next-block pointers are then written to these blocks. "Nearest” may be defined using a variety of measures, but the four choices with useful properties are: 1) Euclidean (L2) norm. This will select the block with the shortest straight-line distance to the target, filling up blocks in concentric rings.
- This norm has the advantages that, like the Hilbert curve norm, it uniquely defines the "nearest" non-allocated block. Assuming that there are no collisions with pre-existing full blocks, sequential blocks are adjacent, thereby forming an expanding spiral around the target.
- an allocator can take into account not only the distance of each block from the target point, but also the distance of each block from the previous block. The same measure can be used for measuring these two distances. Alternatively, different measures can be used, and these two distances can be added or otherwise combined with a relative weighting factor to give a combined distance. Weighing distance from the previous block heavily in deciding on the position of the next block favors coherence, while weighing absolute distance to the target point heavily favors spatial localization. Hence this scheme allows coherence and localization to be traded off in any manner desired by adjusting a single parameter.
- Block size defines the memory granularity, analogously to "memory alignment" in a ordinary ID (one-dimensional) memory. Large block sizes decrease data overhead, as the next-block pointers use a fraction 2/n ⁇ 2 of the allocated space; and they also improve coherence. However, large block sizes increase wasted space, since whole blocks are allocated at a time. Wasted space may in turn worsen spatial localization.
- the appropriate choice of block size depends on the distribution of expected chunk lengths, as well as the spatial distribution of target points. Making the best choice is complex, and should in general be done by experiment using typical data.
- FIGS. 11-13 show data images (enhanced for high contrast) for one of the U.S. Virgin Islands using 4*4 blocks (FIG. 11), 6*6 blocks (FIG. 12) and 8*8 blocks (FIG. 13). Wasted space is drawn as white in FIGS. 11-13 for the sake of clarity (though in practice, for improved compression performance, wasted space is assigned value zero, or black). Clearly the 8*8 blocks both waste a great deal of space and offer poor localization, whereas the 4x4 blocks waste much less space and localize better. The 4*4 block image of FIG. 11 also compresses to a smaller file size than the other two.
- FIG. 5 shows the same densely populated region of a data image before flattening (FIG. 5A) and after flattening (FIG. 5B).
- the data image used in FIG. 5 is a roadmap data image of the Virgin Islands. It is noted that FIG. 5B has been deliberately darkened so as to be more visible in this application. In FIG. 5B as presented, the rectangular image as a whole is a faint shade of grey. Moreover, a small amount of the pixel value variation highly evident in FIG.
- FIG. 5A is still visible in FIG. 5B, mostly in the bottom half of the image.
- the consistency of the pixel values throughout the vast majority of the pixels of FIG. 5B bears witness to the effectiveness of the extent of the "flattening" of the data of FIG. 5 A.
- the data image has full 8-bit dynamic range, and exhibits high frequencies and structured patterns that make it compress very poorly (in fact, a lossless JPEG2000-compressed version of this image is no smaller than the original raw size).
- a lossless JPEG2000-compressed version of this image is no smaller than the original raw size.
- the corresponding JPEG2000 compressed version of the image has better than 3: 1 compression.
- “Flattening” can consist of a number of simple data transformations, including the following (this is the complete list of transformations applied in the example of FIG. 5): The Flattening Technique Applied to FIG. 5
- 16-bit unsigned values such as the width or height of the data block, would normally be encoded using a high-order byte and a low-order byte.
- We may use 16 bits because the values to be encoded occasionally exceed 255 (the 8-bit limit) by some unspecified amount. However, in the majority of cases, these values are do not exceed 255.
- the high-order byte would be zero. Frequent zero high-order bytes followed by significant low-order bytes account for much of the 2-pixel periodicity apparent in parts of FIG. 5 A.
- the left eight columns represent the first pixel of the pair, previously the high-order byte, and the rightmost eight columns represent the second pixel, previously the low-order byte.
- the range of accessible values (0-65535) remains unchanged, but the two bytes become much more symmetric.
- the two bytes each assume values ⁇ 16.
- Similar bit-interleaving techniques apply to 32-bit or larger integer values. These techniques are also extensible to signed quantities. For variables in which the sign changes frequently, as occurs for differential coding of a road vector, a sign bit can be assigned to position 0, and the absolute value can be encoded in alternating bytes as above.
- road vector data may be represented at greater than pixel precision.
- Arbitrary units smaller than a pixel can instead be used, or equivalently, sub-pixel precision can be implemented using fixed point arithmetic in combination with the above techniques.
- 4 sub-pixel bits are used, for 1/16 pixel precision.
- each data block is 2 or more pixels wide, we can subtract 2 from the data width before encoding. More significantly, both pointers and any position vectors encoded in a data block are specified in pixels relative to the pointer position, rather than in absolute coordinates. This not only greatly decreases the magnitude of the numbers to encode, it also allows a portion of the data image to be decoded and rendered vectorially in a local coordinate system without regard for the absolute position of this portion.
- the JPEG2000 representation of the map data (including lossy pre-rendered roadmap image, lossless pointer layer, and lossless data layer) is actually smaller than the compressed ZIP file representing the original data as tabulated text.
- This file is part of the United States Census Bureau's 2002 TIGER/Line database.
- the new representation is ready to serve interactively to a client, with efficient support for continuously pannable and zoomable spatial access.
- the original prerendered multiscale map invention introduced in document (2) (which is attached hereto as Exhibit B) included not a single prerendered image, but a stack of such images, rendered at progressively coarser resolutions and with rescaled weights for lines (or other visible features).
- One or more embodiments of the present invention can be extended to include pointer and data images corresponding to the coarser prerendered roadmap images, in which only a subset of the original vector objects are represented.
- statewide pointer and data images which are at much lower resolution than those used for prerendered images, might only include data for state and national highways, excluding all local roads.
- coarser data may also be "abstracts", for example specifying only road names, not vectors.
- Images at different resolutions might include varying mixtures or subsets of the original data, or abstracted versions. This technique both allows all of the relevant data to fit into the smaller coarse images, and provides the client with the subset of the vector information relevant for navigation at that scale.
- the prerendered images may also be in color. Further, the prerendered images may be displayed by the client in color even if they are single-channel images, since the vector data can be used to draw important roads in different colors than the prerendered material. Finally, the prerendered images may omit certain features or roads present in the vector data, relying on the client to composite the image and vector material appropriately.
- the present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for progressively rendering arbitrarily large or complex visual content in a zooming environment while maintaining good user responsiveness and high frame rates. Although it is necessary in some situations to temporarily degrade the quality of the rendition to meet these goals, the present invention largely masks this degradation by exploiting well-known properties of the human visual system.
- GUIs graphical computer user interfaces
- visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
- the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneoixs text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
- viewing maps browsing through large heterogeneoixs text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
- zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
- these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally.
- continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent.
- any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning.
- Ideas along these lines made appearances as futuristic computer user interfaces 1 in many movies even as early as the 1960s ; recent movies continue the trend .
- a number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present. 3 In 1991, some of these ideas were formalized in U.S. Patent 5,341 ,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability").
- the prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have
- Voss zooming user interface framework
- This patent is specifically about Voss's approach to object tiling, level-of-detail blending, and render queueing.
- a multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an image pyramid).
- the present invention involves both strategies for prioritizing the (potentially slow) rendition of the parts of the image pyramid relevent to the current display, and stategies for presenting the user with a smooth, continuous perception of the rendered content based on partial information, i.e. only the currently available subset of the image pyramid.
- these strategies make near-optimal use of the available computing power or bandwidth, while masking, to the extent possible, any image degradation resulting from incomplete image pyramids. Spatial and temporal blending are exploited to avoid discontinuities or sudden changes in image sharpness.
- An objective of the present invention is to allow sample d (i.e. "pixellated") visual content to be rendered in a zooming user interface without degradation in ultimate image quality relative to conventional trilinear interpolation.
- a further objective of the present invention is to allow arbitrarily large or complex visual content to be viewed in a zooming user interface.
- a further objective of the present invention is to enable near- immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network.
- a further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates.
- a further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.
- a further objective of the present invention is to minimize the user's perception of transitions between levels of detail or rendition qualities during interaction.
- a further objective of the present invention is to allow the graceful degradation of image quality by continuous blurring when detailed visual content is as yet unavailable, either because the information needed to render it is unavailable, or because rendition is still in progress.
- a further objective of the present invention is to gracefully increase image quality by gradual sharpening when renditions of certain parts of the visual content first become available.
- zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome.
- One such limitation is on the size of a document that can be "opened” from a computer application, as traditionally the entirety of such a document must be “loaded” before viewing or editing can begin.
- RAM random access memory
- this limitation is felt, because all of the document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open” command and being able to begin viewing or editing unacceptably long.
- Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem.
- Table 1 shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed.
- the image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl.
- a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl.
- the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short.
- LODs levels of detail
- a low-resolution image serves as a "predictor" for the next higher resolution.
- This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non- hierarchical representation of the high-resolution image alone. If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in the repository, then a natural consequence is that as the image is transferred across the data link to the cache, the user can obtain a low- resolution overview of the entire image very rapidly; finer and finer details will then "fill in” as the transmission progresses. This is known as incremental or progressive transmission.
- an image browsing system can be made that is not only capable of viewing images of arbitrarily large size, but is also capable of navigating (i.e. zooming and panning) through such images efficiently at any level of detail.
- Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order.
- This model is random-access, meaning that only selected parts of the information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session.
- the computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display.
- each level of detail is the basic unit of transmission.
- the size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile.
- the resulting tiled image pyramid is shown in Figure 2. Note that the "tip" of the pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1.
- the JPEG2000 image format includes all of the features just described for representing tiled, multiresolution and random-access images.
- This includes (but is not limited to) large texts, maps or other vector graphics, spreadsheets, video, and mixed documents such as web pages.
- Our discussion thus far has also implicitly considered a viewing-only application, i.e. one in which only the actions or methods corresponding to opening and drawing need be defined.
- Clearly other methods may be desirable, such as the editing commands implemented by paint programs for static images, the editing commands implemented by word processors for texts, etc.
- the process of rendering a tile, once obtained, is trivial, since the information (once decompressed) is precisely the pixel-by-pixel contents of the tile.
- the speed bottleneck is normally the transfer of compressed data to the computer (e.g. downloading).
- the speed bottleneck is in the rendition of tiles; the information used to make the rendition may already be stored locally, or may be very compact, so that downloading no longer causes delay.
- this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant.
- a complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on.
- documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary of the parent document.
- We call each such document a node borrowing from programming terminology for trees.
- nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode” — which can be navigated by zooming and panning.
- zooming user interface There are a number of immediate consequences for a well-implemented zooming user interface, including: - - It is able to browse very large documents without downloading them in their entirety from the repository; thus even documents larger than the available short-term memory, or whose size would otherwise be prohibitive, can be viewed without limitation.
- - - Content is only downloaded as needed during navigation, resulting in optimally efficient use of the available bandwidth.
- zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale.
- - - High-resolution displays no longer imply shrinking text and images to small (sometimes illegible) sizes; depending on the level of zooming, they either allow more content to be viewed at once, or they allow content to be viewed at normal size and higher fidelity.
- - - The vision impaired can easily navigate the same content as normally sighted people, simply by zooming in farther. -
- variable names /and g. /refers to the sampling density of a tile relative to the display, defined in #1.
- Tiling granularity which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a some LOD to the linear tiling grid size at the next lower LOD. This is in general presumed to be
- the client's first priority will be to fill in this "resolution hole”. If more than one level of detail is missing in the hole, then requests for all levels of detail with/ ⁇ 1 , plus the next higher level of detail (to allow LOD blending — see #5), are queued in increasing order. At first glance, one might suppose that this introduces unnecessary overhead, because only the finest of these levels of detail is strictly required to render the current view; the coarser levels of detail are redundant, in that they define a lower-resolution image on the display. However, these coarser levels cover a larger area — in general, an area considerably larger than the display.
- the coarsest level of detail for any node in fact includes only a single tile by construction, so a client rendering any view of a node will invariably queue this "outermost" tile first.
- robustness we mean that the client is never "at a loss” regarding what to display in response to a user's panning and zooming, even if there is a large backlog of tile requests waiting to be filled.
- the client simply displays the best (i.e. highest resolution) image available for every region on the display. At worst, this will be the outermost tile, which is the first tile ever requested in connection with the node. Therefore, every spatial part of the node will always be renderable based on the first tile request alone; all subsequent tile requests can be considered incremental refinements.
- foveated tile request queuing usually reflects the user's implicit prioritization for visual information during inward zooms. Furthermore, because the user's eye generally spends more time looking at regions near the center of the display than the edge, residual blurriness at the display edge is less noticeable than near the center. The transient, relative increase in sharpness near the center of the display produced by zooming in using foveal tile request order also mirrors the natural consequences of zooming out — see Figure 4. The figure shows two alternate "navigation paths": in the top row, the user remains stationary while viewing a single document (or node) occupying about two thirds of the display, which we assume can be displayed at very high resolution.
- the second row we follow what happens if the user were to zoom in on the shaded square before the image displayed in the top row is fully refined. Tiles at higher levels of detail are again queued, but in this case only those that are partially or fully visible. Refinement progresses to a point comparable to that of the top row (in terms of number of visible tiles on the display).
- the third row shows what is available if the user then zooms out again, and how the missing detail is filled in.
- every small constant interval of time corresponds to a constant percent change in the opacity; for example, the new tile may become 20% more opaque at every frame, which results in the sequence of opacities over consecutive frames 20%, 36%, 49%, 59%, 67%, 74%, 79%, 83%, 87%, 89%, 91%, 93%, etc.
- the exponential never reaches 100%, but in practice, the opacity becomes indistinguishable from 100% after a short interval.
- An exponential blend has the advantage that the greatest increase in opacity occurs near the beginning of the blending-in, which makes the new information visible to the user quickly while still preserving acceptable temporal continuity.
- Part (b) is a rectangle in which the opacities of two opposing edges are different; then the opacity over the interior is simply a linear interpolation based on the shortest distance of each interior point from the two edges.
- Part (c) shows a bilinear method for interpolating opacity over a triangle, when the opacities of all three comers abc may be different.
- every interior point p subdivides the triangle into three sub-triangles as shown, with areas A, B and C.
- the opacity at p is then simply a weighted sum of the opacities at the corners, where the weights are the fractional areas of the three sub-triangles (i.e.
- this strategy causes the relative level of detail visible to the user to be a continuous function, both over the display area and in time. Both spatial seams and temporal discontinuities are thereby avoided, presenting the user with a visual experience reminiscent of an optical instrument bringing a scene continuously into focus. For navigating large documents, the speed with which the scene comes into focus is a function of the bandwidth of the connection to the repository, or the speed of tile rendition, whichever is slower. Finally, in combination with the foveated prioritization of innovation #2, the continuous level of detail is biased in such a way that the central area of the display is brought into focus first. 5.
- Generalized linear-mipmap-linear LOD blending Generalized linear-mipmap-linear LOD blending.
- each tile shard has an opacity as drawn, which has been spatially averaged with neighboring tile shards at the same level of detail for spatial smoothness, and temporally averaged for smoothness over time.
- the target opacity is 100% if the level of detail undersamples the display, i.e. / ⁇ 1 (see #1).
- the target opacity is decreased linearly (or using any other monotonic function) such that it goes to zero if the oversampling is g-fold.
- this causes continuous blending over a zoom operation, ensuring that the perceived level of detail never changes suddenly.
- the number of blended levels of detail in this scheme can be one, two, or more. A number larger than two is transient, and caused by tiles at more than one level of detail not having been fully blended in temporally yet.
- a single level is also usually transient, in that it normally occurs when a lower-than-ideal LOD is "standing in” at 100% opacity for higher LODs which have yet to be downloaded or constructed and blended in.
- the simplest reference implementation for rendering the set of tile shards for a node is to use the so-called “painter's algorithm": all tile shards are rendered in back-to- front order, that is, from coarsest (lowest LOD) to finest (highest LOD which oversamples the display less than g-fold).
- the target opacities of all but the highest LOD are 100%, though they may transiently be rendered at lower opacity if their temporal blending is incomplete.
- the highest LOD has variable opacity, depending on how much it oversamples the display, as discussed above.
- this reference implementation is not optimal, in that it may render shards which are then fully obscured by subsequently rendered shards. More optimal implementations are possible through the use of data structures and algorithms analogous to those used for hidden surface removal in 3D graphics. 6.
- Motion anticipation During rapid zooming or panning, it is especially difficult for tile requests to keep up with demand. Yet during these rapid navigation patterns, the zooming or panning motion tends to be locally well-predicted by linear extrapolation (i.e. it is difficult to make sudden reversals or changes in direction). Thus we exploit this temporal motion coherence to generate tile requests slightly ahead of time, thus improving visual quality.
- the virtual viewport relaxes over a brief interval of time back to the real viewport.
- the present invention relates generally to multiresolution imagery. More specifically, the invention is a system and method for efficiently blending together visual representations of content at different resolutions or levels of detail in real time. The method ensures perceptual continuity even in highly dynamic contexts, in which the data being visualized may be changing, and only partial data may be available at any given time.
- the invention has applications in a number of fields, including (but not limited to) zooming user interfaces (ZUIs) for computers.
- ZUIs zooming user interfaces
- the present invention is a general approach to the dynamic display of such multiresolution visual data on one or more 2D displays (such as CRTs or LCD screens).
- 2D displays such as CRTs or LCD screens.
- wavelet decomposition of a large digital image (e.g. as used in the JPEG2000 image format).
- This decomposition takes as its starting point the original pixel data, normally an array of samples on a regular rectangular grid. Each sample usually represents a color or luminance measured at a point in space corresponding to its grid coordinates. In some applications the grid may be very large, e.g. tens of thousands of samples (pixels) on a side, or more.
- the image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl .
- a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl .
- the granularity may change at different scales, but here, for example and without limitation, we will assume that g is constant over the "image pyramid". Obviously the fine details http://www.jpeg.org/JPEG2000.html are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently- sized images or scales are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution.
- each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. Hence if we assume 64x64 pixel tiles, the 512x512 pixel image considered earlier has 8x8 tiles at its highest level of detail, 4x4 at the 256x256 level, 2x2 at the 128x128 level, and a single tile at the remaining levels of detail.
- the JPEG2000 image format includes the features just described for representing tiled, multiresolution and random-access images.
- tiled JPEG2000 image is being viewed interactively by a client on a 2D display of limited size and resolution, then some particular set of adjacent tiles, at a certain level of detail, are needed to produce an accurate rendition. In a dynamic context, however, these may not all be available. Tiles at coarser levels of detail often will be available, however, particularly if the user began with a broad overview of the image. Since tiles at coarser levels of detail span a much wider area spatially, it is likely that the entire area of interest is covered by some combination of available tiles. This implies that the image resolution available will not be constant over the display area.
- this technique relies on compositing in the framebuffer — meaning that, at intermediate points during the drawing operation, the regions drawn do not have their final appearance; this makes it necessary to use double-buffering or related methods and perform the compositing off-screen to avoid the appearance of flickering resolution.
- this technique can only be used for an opaque rendition — it is not possible, for example, to ensure that the final rendition has 50% opacity everywhere, allowing other content to "show through”. This is because the painter's algorithm relies precisely on the effect of one "layer of paint” (i.e. level of detail) fully obscuring the one undemeath; it is not known in advance where a level of detail will be obscured, and where not.
- the Invention relies precisely on the effect of one "layer of paint” (i.e. level of detail) fully obscuring the one undemeath; it is not known in advance where a level of detail will be obscured, and where not.
- the present invention resolves these issues, while preserving all the advantages of the painter's algorithm.
- One of these advantages is the ability to deal with any kind of LOD tiling, including non-rectangular or irregular tilings, as well as irrational grid tilings, for which I am filing a separate provisional patent application.
- Tilings generally consist of a subdivision, or tesselation, of the area containing the visual content into polygons.
- the multiplicative factor by which their sizes differ is the granularity g, which we will assume (but without limitation) to be a constant.
- the improved algorithm consists of four stages.
- a composite grid is constructed in the image's reference frame from the superposition of the visible parts of all of the tile grids in all of the levels of detail to be drawn.
- the irrational tiling innovation (detailed in a separate provisional patent application) is used, this results in an irregular composite grid, shown schematically in Figure 1.
- the grid is further augmented by grid lines corresponding to the x- and v- values which would be needed to draw the tile "blending flaps" at each level of detail (not shown in Figure 1, because the resulting grid would be too dense and visually confusing).
- This composite grid which can be defined by a sorted list of x- and y- values for the grid lines, has the property that the vertices of all of the rectangles and triangles that would be needed to draw all visible tiles (including their blending flaps) lie at the intersection of an x and v grid line. Let there be n grid lines parallel to the -axis and m grid lines parallel to the y-axis. We then construct a two-dimensional n * m table, with entries corresponding to the squares of the grid.
- Each grid entry has two fields: an opacity, which is initialized to zero, and a list of references to specific tiles, which is initially empty.
- the second stage is to walk through the tiles, sorted by decreasing level of detail (opposite to the na ⁇ ve implementation). Each tile covers an integral number of composite grid squares. For each of these squares, we check to see if its table entry has an opacity less than 100%, and if so, we add the current tile to its list and increase the opacity accordingly.
- the per-tile opacity used in this step is stored in the tile data structure.
- the third stage of the algorithm is a traversal of the composite grid in which tile shard opacities at the composite grid vertices are adjusted by averaging with neighboring vertices at the same level of detail, followed by readjustment of the vertex opacities to preserve the summed opacity at each vertex (normally 100%).
- This implements a refined version of the spatial smoothing of scale described in a separate provisional patent application.
- the composite grid is in general denser than the 3x3 grid per tile defined in innovation #4, especially for low-resolution tiles. (At the highest LOD, by construction, the composite gridding will be at least as fine as necessary.) This allows the averaging technique to achieve greater smoothness in apparent level of detail, in effect by creating smoother blending flaps consisting of a larger number of tile shards. Finally, in the fourth stage the composite grid is again traversed, and the tile shards are actually drawn.
- this algorithm involves multiple passes over the data and a certain amount of bookkeeping, it results in far better performance than the na ⁇ ve algorithm, because much less drawing must take place in the end; every tile shard rendered is visible to the user, though sometimes at low opacity. Some tiles may not be drawn at all. This contrasts with the na ⁇ ve algorithm, which draws every tile intersecting with the displayed area in its entirety.
- An additional advantage of this algorithm is that it allows partially transparent nodes to be drawn, simply by changing the total opacity target from 100% to some lower value. This is not possible with the na ⁇ ve algorithm, because every level of detail except the most detailed must be drawn at full opacity in order to completely "paint over" any underlying, still lower resolution tiles.
- the composite grid can be constructed in the usual manner; it may be larger than the grid would have been for the unrotated case, as larger coordinate ranges are visible along a diagonal.
- the composite grid squares outside the viewing area need not be updated during the traversal in the second or third stages, or drawn in the fourth stage. Note that a number of other implementation details can be modified to optimize performance; the algorithm is presented here in a form that makes its operation and essential features easiest to understand. A graphics programmer skilled in the art can easily add the optimizing implementation details.
- each level of detail can be drawn immediately as it is completed, with the correct opacity, thus requiring only the storage of a single tile identity per shard at any one time.
- Another exemplary optimization is that the total opacity rendering left to do, expressed in terms of (area) x (remaining opacity), can be kept track of, so that the algorithm can quit early if everything has already been drawn; then low levels of detail need not be "visited” at all if they are not needed.
- the algorithm can be generalized to arbitrary polygonal tiling patterns by using a constrained Delaunay triangulation instead of a grid to store vertex opacities and tile shard identifiers.
- This data structure efficiently creates a triangulation whose edges contain every edge in all of the original LOD grids; accessing a particular triangle or vertex is an efficient operation, which can take place in of order n*log( ⁇ ) time (where n is the number of vertices or triangles added).
- n is the number of vertices or triangles added.
- FIGURE 1 A SYSTEM AND METHOD FOR MULTIPLE NODE DISPLAY
- the present invention relates to zooming user interfaces (ZUI) for computers.
- ZUI zooming user interfaces
- Most present day graphical computer user interfaces are designed using visual components of a fixed spacial scale. The visual content can be manipulated by zooming in or out or otherwise navigating through it.
- the precision with which coordinates of various objects can be represented is extremely limited by the number of bits, usually between 16 and 64, designated to represent such coordinates. Because of their limited representational size, there is limited precision.
- the user In the context of the zooming user interface, the user is easily able to zoom in, causing the area which previously covered only a single pixel to fill the entire display. Conversely, the user may zoom out, causing the contents of the entire display to shrink to the size of a single pixel. Since each zoom in or out may multiply or divide the xy coordinates by numerous orders of magnitude, just a few such zooms completely exhaust the precision available with a 64 bit floating point number, for example. Thereafter, round-off causes noticeable degradation of image quality.
- a further objective of the present invention is to allow zooming in immediately after zooming out to behave analogously to the "forward" button of a web browser, letting the user precisely undo the effects of an arbitrarily long zoom-out.
- a further objective of the present invention is to allow a node, a visual object as defined more precisely below, to have a very large number of child nodes (for example, up to 10 ⁇ 28).
- a further objective of the present invention is to allow a node to generate its own children programmatically on the fly, enabling content to be defined, created or modified dynamically during navigation.
- a further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if the data are stored at a remote location and shared over a low- bandwidth network.
- a further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates.
- a further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.
- Each node preferably has its own coordinate system and rendering method, but may be contained within a parent node, and may be represented in the coordinate system and rendering method of the parent node.
- a node is only "launched" when the zooming results in an appropriate level of detail.
- the launching of the node causes the node to be represented in its own coordinate system and/or rendering method, rather than in the coordinate system and/or rendering method of a different node.
- the node Prior to the node being launched, the node is either represented in the coordinate system of the parent node, or not represented at all.
- the precision of a coordinate system is a function of the zoom level of detail of what is being displayed. This allows a variable level of precision, up to and including the maximum permissible by the memory of the computer in which the system operates.
- FIG. 1 is a depiction of visual content on a display
- FIG. 2 is an image of the visual content of FIG. 1 at a different level of detail
- FIG. 3 is a representation of an embodiment of the invention.
- FIG. 4 is an exemplary embodiment of the invention showing plural nodes on a display
- FIG 5 is a tree diagram corresponding to the exemplary embodiment shown in FIG 4.
- the exemplary universe in turn contains 2D objects, or nodes, which have a visual representation, and may also be dynamic or interactive (i.e. video clips, applications, editable text documents, CAD drawings, or still images).
- nodes For a node to be visible it must be associated with a rendering method, which is able to draw it in whole or in part on some area of the display.
- Each node is also endowed with a local coordinate system of finite precision. For illustrative purposes, we assume a node is rectangular and represented by a local coordinate system.
- These two parameters, the rendering method and coordinate system specify how to display the node, and the positions of items in the node.
- Each node may have 0 or more child nodes, which are addressed by reference.
- the node need not, and generally does not, contain all the information of each child node, but instead only an address providing information necessary to obtain the child node.
- the nodes are displayed on the screen, as shown, for example in FIG. 1.
- node is the basic unit of functionality in the present invention. Most nodes manifest visually on the user's display during navigation, and some nodes may also be animated and/or respond to user input.
- Nodes are hierarchical, in that a node may contain child nodes. The containing node is then called a parent node. When a parent node contains a child node, the child's visual manifestation is also contained within the parent's visual manifestation.
- Each node has a logical coordinate system, such that the entire extent of the node is contained within an exemplary rectangle defined in this logical coordinate system; e.g. a node may define a logical coordinate system such that it is contained in the rectangle (0,0)-(100,100).
- Each node may have the following data defining its properties: o the node's logical coordinate system, including its logical size (100 x 100 in the above example); o the identities, positions and sizes of any child nodes, specified in the (parent) node's logical coordinates; o optionally, any necessary user data; executable code defining these operations or "methods”: o initialization of the node's data based on "construction arguments” o rendering all or a portion of the node's visual appearance (the output of this method is a rendered tile); o optionally, responding to user input, such as keyboard or mouse events.
- the executable code defines a "node class", and may be shared among many "node instances". Node instances differ in their data content. Hence a node class might define the logic needed to render a JPEG image.
- the "construction arguments" given to the initialization code would then include the URL of the JPEG image to display.
- a node displaying a particular image would be an instance of the JPEG node class.
- Plural instances of a node may be viewable in the same visual content, similar to the way a software application may be instantiated numerous times simultaneously.
- buttons may all have the same behavior, and hence all be instances of the same node class; the images may all be in the same format and so also be instances of a common node class, etc. This also simplifies rearranging the layout — the parent node can easily move or resize the child nodes.
- Fig. 1 shows a node 105 which may be the image of a portion of the city.
- Node 105 may contain child nodes 101-103.
- Node 101 may be an image of a building in the city, node 102 could be an image of a playground, and node 103 might be a sports arena.
- nodes 101-103 are relatively small, so they can be represented as a small darkened area with no detail in node 105, located at the correct location in the coordinate system of node 105. Only the coordinate system and the rendering method of node 105 is needed.
- sports arena node 103 would now be displayed not as a darkened area with no detail in the coordinate system of node 105, but rather, it would be "launched” to be displayed using its own coordinate system, and rendering method. When displayed using its own coordinate system and rendering method, the details such as seating, the filed of play, etc. would be individually shown. Other functions discussed above, and associated with the node 103, would also begin executing at the point when node 103 is launched.
- the particular navigation condition that causes the launching of node 103, or any node for that matter, is a function of design choice and is not critical to the present invention.
- the precision with which the node 103 will be displayed is the combined precision of the coordinate system utilized by node 105, as well as that of node 103.
- the combined precision will be 16 bits because the coordinate system of node 103 is only utilized to specify the position of items in node 103, but the overall location of node 103 within node 105 is specified within the coordinate system of node 105. Note that this nesting may continue repeatedly if sports arena 103 itself contains additional nodes within it. For example, one such node 201 may in fact be a particular concession stand within the sports arena. It is represented without much detail in the coordinate system and rendering method of node 103.
- node 201 will launch. If it is displayed using 8 bits of precision, those 8 bits will specify where within the node 201 coordinate system particular items are to be displayed. Yet, the location of node 201 within node 103 will be maintained to 8 bits of precision within the coordinate system of node 103, the location of which will in turn be maintained within the coordinate system of node 105 using 8 bits. Hence, items within node 201 will ultimately be displayed using 24 bits of precision.
- the precision at which visual content may ultimately be displayed is limited only by the memory capacity of the computer.
- the ultimate precision with which visual content in a node is displayed after that node is launched is effectively the combined precision of all parent nodes and the precision of the node that has launched.
- the precision may increase as needed limited only by the storage capacity of the computer, which is almost always much more than sufficient.
- the increased precision is only utilized when necessary, because if the image is at an LOD that does not require launching, then in accordance with the above description, it will only be displayed with the precision of the node within which it is contained if that node has been launched.
- An additional issue solved by the present invention relates to a system for maintaining the spatial intercelationship among all nodes during display. More particularly, during dynamic navigation such as zooming and panning, many different coordinate systems are being used to display potentially different nodes. Some nodes, as explained above, are being displayed merely as an image in the coordinate system of other nodes, and some are being displayed in their own coordinate systems. Indeed, the entire visual display may be populated with nodes displayed at different positions in different coordinate systems, and the coordinate systems and precisions used for the various nodes may vary during navigation as nodes are launched.
- the present invention provides a technique for propagating relative location information among all of the nodes and for updating that information when needed so that each node will "know" the proper position in the overall view at which it should render itself.
- the expanded node definition includes a field which we term the "view" field, and which is used by the node to locate itself relative to the entire display.
- the view field represents, in the coordinates of that node, the visible area of the node — that is, the image of the display rectangle in the node's coordinates. This rectangle may only partially overlap the node's area, as when the node is partially off-screen.
- the view field cannot always be kept updated for every node, as we cannot necessarily traverse the entire directed graph of nodes in real time as navigation occurs.
- the stack structure is defined thus: Stack ⁇ Address> viewStack; where this stack is a global variable of the client (the computer connected to the display).
- this stack is a global variable of the client (the computer connected to the display).
- the viewStack will specify the addresses of a sequence of nodes "pierced" by a point relative to the display, which we will take in our exemplary implementation to be the center of the display. This sequence must begin with the root node, but may be infinite, and therefore must be truncated. In an exemplary embodiment, the sequence is truncated when the nodes "pierced" become smaller than some minimum size, defined as minimumAre a.
- the current view is then represented by the view fields of all of the nodes in the viewStack:, each of which specify the current view in terms of the node's local coordinate system.
- the last element's view field does not, however, specify the user's viewpoint relative to the entire universe, but only relative to its local coordinates.
- the view field of the root node does specify where in the universe the user is looking. Nodes closer to the "fine end" of the viewStack thus specify the view position with increasing precision, but relative to progressively smaller areas in the universe. This is shown conceptually in FIG.
- node 303 provides the most accurate indication of where the user is looking, since its coordinate system is the "finest", but node 301 provides information, albeit not as fine, on a much larger area of the visual content.
- Changing the view during any navigation operation proceeds as follows. Because the last node in the viewStack has the most precise representation of the view, the first step is to alter the view field of this last node; this altered view is taken to be the correct new view, and any other visible nodes must follow along. The second step is to propagate the new view "upward" toward the root node, which entails making progressively smaller and smaller changes to the view fields of nodes earlier in the stack. If the; user is deeply zoomed, then at some point in the upward propagation the alteration to the view may be so small that it ceases to be accurately representable; upward propagation stops at this node. At each stage of the upward propagation, the change is also propagated downward to other visible nodes.
- the last node's parent's view is modified; then, in the downward propagation, the last node's "siblings".
- the downward propagation is halted, as before, when the areas of "cousin nodes" become smaller than minimumArea, or when a node falls entirely offscreen.
- FIGs. 4 and 5 The foregoing technique involves translating the layout of the various nodes into a tree, which conceptually is illustrated in FIGs. 4 and 5. As can be seen from FIGs. 4 and 5, there is a corresponding tree for a particular displayed set of nodes, and the tree structure may be used to propagate the view information as previously described.
- a panning operation may move the last node far enough away that it no longer belongs in the viewStack.
- zooming in might enlarge a child to the extent that a lengthening of the viewStack is required, or zooming out might bring the last node's area below a minimum area requiring a truncation of the viewStack.
- identity of the last node changes.
- the present invention relates to methods and apparatus for navigating, such as zooming and panning, over an image of an object in such a way as to provide the appearance of smooth, continuous navigational movement.
- GUIs graphical computer user interfaces
- visual components may be represented and manipulated such that they do not have a fixed spatial scale on the display; indeed, the visual components may be panned and/or zoomed in or out.
- the ability to zoom in and out on an image is desirable in connection with, for example, viewing maps, browsing through text layouts such as newspapers, viewing digital photographs, viewing blueprints or diagrams, and viewing other large data sets .
- zoomable components Many existing computer applications, such as Microsoft Word, Adobe Photo Shop, Adobe Acrobat, etc., include zoomable components.
- the zooming capability provided by these computer applications is a peripheral aspect of a user's interaction with the software and the zooming feature is only employed occasionally.
- These computer applications permit a user to pan over an image smoothly and continuously (e.g., utilizing scroll bars or the cursor to translate the viewed image left, right, up or down) .
- a significant problem with such computer applications is that they do not permit a user to zoom smoothly and continuously. Indeed, they provide zooming in discrete steps, such as 10%, 25%, 50%, 75%, 100%, 150%, 200%, 500%, etc. The user selects the desired zoom using the cursor and, in response, the image changes abruptly to the selected zoom level .
- FIGS. 1-4 are examples of images that one may obtain from the MapQuest website in response to a query for a regional map of Long Island, NY, U.S.A.
- the MapQuest website permits the user to zoom in and zoom out to discrete levels, such as 10 levels.
- FIG. 1 is a rendition at zoom level 5, which is approximately 100 meters/pixel.
- FIG. 2 is an image at a zoom level 6, which is about 35 meters/pixel.
- FIG. 3 is an image at a zoom level 7, which is about 20 meters/pixel.
- FIG. 4 is an image at a zoom level 9, which is about 10 meters/pixel.
- Al roads primary highways; A2, primary roads; A3, state highways, secondary roads, and connecting roads; A4, local streets, city streets and rural roads; and A5, dirt roads.
- These roads may be considered the elements of an overall object (i.e., a roadmap).
- the coarseness of the road elements manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably more A2 roads than Al roads.
- the physical dimensions of the roads e.g., their widths), vary significantly.
- Al roads may be about 16 meters wide
- A2 roads may be about 12 meters wide
- A3 roads may be about 8 meters wide
- A4 roads may be about 5 meters wide
- A5 roads may be about 2.5 meters wide.
- the MapQuest computer application deals with these varying levels of coarseness by displaying only the road categories deemed appropriate at a particular zoom level. For example, a nation-wide view might only show Al roads, while a state-wide view might show Al and A2 roads, and a county-wide view might show Al, A2 and A3 roads. Even if MapQuest were modified to allow continuous zooming of the roadmap, this approach would lead to the sudden appearance and disappearance of road categories during zooming, which is confusing and visually displeasing.
- methods and apparatus are contemplated to perform various actions, including: zooming into or out of an image having at least one object, wherein at least some elements of at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming.
- At least some elements of the at least one object may also be scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming.
- the elements of the object may be of varying degrees of coarseness.
- the coarseness of the elements of a roadmap object manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably more A2 roads than Al roads.
- Degree of coarseness in road categories also manifests in such properties as average road length, frequency of intersections, and maximum curvature.
- the coarseness of the elements of other image objects may manifest in other ways too numerous to list in their entirety.
- the scaling of the elements in a given predetermined image may be physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level of the given predetermined image.
- the object may be a roadmap
- the elements of the object may be roads
- the varying degrees of coarseness may be road hierarchies.
- the scaling of a given road in a given predetermined image may be physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level of the given predetermined image.
- methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of a roadmap; receiving one or more user navigation commands including zooming information at the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.
- methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of at least one object, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving one or more user navigation commands including zooming information at the client terminal; blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal .
- methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of a roadmap to a client terminal over a communications channel; receiving the plurality of pre-rendered images at the client terminal; issuing one or more user navigation commands including zooming information using the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.
- methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of at least one object to a client terminal over a communications channel, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving the plurality of pre-rendered images at the client terminal; issuing one or more user navigation commands including zooming information using the client terminal; blending two of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal.
- FIG. 1 is an image taken from the MapQuest website, which is at a zoom level 5;
- FIG. 2 is an image taken from the MapQuest website, which is at a zoom level 6;
- FIG. 3 is an image taken from the MapQuest website, which is at a zoom level 7;
- FIG. 4 is an image taken from the MapQuest website, which is at a zoom level 9;
- FIG. 5 is an image of Long Island produced at a zoom level of about 334 meters/pixel in accordance with one or more aspects of the present invention
- FIG. 6 is an image of Long Island produced at a zoom level of about 191 meters/pixel in accordance with one or more further aspects of the present invention.
- FIG. 7 is an image of Long Island produced at a zoom level of about 109.2 meters/pixel in accordance with one or more further aspects of the present invention.
- FIG. 8 is an image of Long Island produced at a zoom level of about 62.4 meters/pixel in accordance with one or more further aspects of the present invention.
- FIG. 10 is an image of Long Island produced at a zoom level of about 20.4 meters/pixel in accordance with one or more further aspects of the present invention.
- FIG. 11 is an image of Long Island produced at a zoom level of about 11.7 meters/pixel in accordance with one or more further aspects of the present invention.
- FIG. 12 is a flow diagram illustrating process steps that may be carried out in order to provide smooth and continuous navigation of an image in accordance with one or more aspects of the present invention
- FIG. 13 is a flow diagram illustrating further process steps that may be carried out in order to smoothly navigate an image in accordance with various aspects of the present invention
- FIG. 14 is a log-log graph of a line width in pixels versus a zoom level in meters/pixel illustrating physical and non-physical scaling' in accordance with one or more further aspects of the present invention.
- FIG. 15 is a log-log graph illustrating variations in the physical and non-physical scaling of FIG. 14.
- FIGS. 16A-D illustrate respective antialiased vertical lines whose endpoints are precisely centered on pixel coordinates
- FIGS. 17A-C illustrate respective antialiased lines on a slant, with endpoints not positioned to fall at exact pixel coordinates
- FIG. 18 is the log-log graph of line width versus zoom level of FIG. 14 including horizontal lines indicating incremental line widths, and vertical lines spaced such that the line width over the interval between two adjacent vertical lines changes by no more than two pixels.
- FIGS. 5-11 a series of images representing the road system of Long Island, NY, U.S.A. where each image is at a different zoom level (or resolution) .
- zoom level or resolution
- the image 100A of the roadmap illustrated in FIG. 5 is at a zoom level that may be characterized by units of physical length/pixel (or physical linear size/pixel) .
- the zoom level, z represents the actual physical linear size that a single pixel of the image 100A represents.
- the zoom level is about 334 meters/pixel.
- FIG. 6 is an image 100B of the same roadmap as FIG.
- FIGS. 5-11 Another significant feature of the present invention as illustrated in FIGS. 5-11 is that little or no detail abruptly appears or disappears when zooming from one level to another level.
- the roadmap 100D of FIG. 8 includes at least Al highways such as 102, A3 secondary roads such as 104, and A4 local roads such as 106. Yet these details, even the A4 local roads 106, may still be seen in image 100A of FIG. 5, which is substantially zoomed out in comparison with the image 100D of FIG. 8.
- the Al, A2, A3, and A4 roads may be distinguished from one another. Even differences between Al primary highways 102 and A2 primary roads 108 may be distinguished from one another vis-a-vis the relative weight given to such roads in the rendered image 100A.
- the user may wish to gain a general sense of what primary highways exist and in what directions they extend. This information may readily be obtained even though the A4 local roads 106 are also depicted.
- FIGS. 12-13 are flow diagrams illustrating process steps that are preferably carried out by the one or more computing devices and/or related equipment .
- the process flow is carried out by commercially available computing equipment (such as Pentium-based computers), any of a number of other techniques may be employed to carry out the process steps without departing from the spirit and scope of the present invention as claimed.
- the hardware employed may be implemented utilizing any other known or hereinafter developed technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, one or more programmable digital devices or systems, such as programmable read only memories (PPOMs) , programmable array logic devices (PALs) , any combination of the above, etc.
- PPOMs programmable read only memories
- PALs programmable array logic devices
- FIG. 12 illustrates an embodiment of the invention in which a plurality of images are prepared (each at a different zoom level or resolution), action 200, and two or more of the images are blended together to achieve the appearance of smooth navigation, such as zooming (action 206) .
- a plurality of images are prepared (each at a different zoom level or resolution), action 200, and two or more of the images are blended together to achieve the appearance of smooth navigation, such as zooming (action 206) .
- zooming action 206
- a service provider would expend the resources to prepare a plurality of pre-rendered images (action 200) ard make the images available to a user's client terminal a communications channel, such as the Internet (action 202).
- the pre-rendered images may be an integral or related par -. of an application program that the user loads and executes on his or her computer.
- the client terminal in response to user-initiated navigation commands (action 204), such as zooming commands, is preferably operable to blend two or more images in order to produce an intermediate resolution image that coincides with the navigation command (action 206) .
- This blending may be accomplished by a number of methods, such as the well-known trilinear interpolation technique described by Lance Williams, Pyramidal Parametrlcs, Computer Graphics, Proc. SIGGRAPH 83, 17(3): 1-11 (1983), the entire disclosure of which is incorporated herein by reference.
- Other approaches to image interpolation are also useful in connection with the present invention, such as bicubic-linear interpolation, and still others may be developed in the future.
- the present invention does not require or depend on any particular one of these blending methods.
- the user may wish to navigate to a zoom level of 62.4 meters/pixel.
- this zoom level may be between two of the pre-rendered images (e.g., in this example between zoom level 50 meters/pixel and zoom level 75 meters/pixel)
- the desired zoom level of 62.4 meters/pixel may be achieved using the trilinear interpolation technique.
- any zoom level between 50 meters/pixel and 75 meters/pixel may be obtained utilizing a blending method as described above, which if performed quickly enough provides the appearance of smooth and continuous navigation.
- the blending technique may be carried through to other zoom levels, such as the 35.7 meters/pixel level illustrated in FIG. 9. In such case, the blending technique may be performed as between the pre-rendered images of 30 meters/pixel and 50 meters/pixel of the example discussed thus far.
- the above blending approach may be used when the computing power of the processing unit on which the invention is carried out is not high enough to (i) perform the rendering operation in the first instance, and/or (ii) perform image rendering "just-in-time” or “on the fly” (for example, in real time) to achieve a high image frame rate for smooth navigation.
- image rendering "just-in-time” or "on the fly” (for example, in real time) to achieve a high image frame rate for smooth navigation.
- other embodiments of the invention contemplate use of known, or hereinafter developed, high power processing units that are capable of rendering at the client terminal for blending and/or high frame rate applications.
- FIG. 13 illustrates the detailed steps and/or actions that are preferably conducted to prepare one or more images in accordance with the present invention.
- the information is obtained regarding the image object or objects using any of the known or hereinafter developed techniques .
- image objects have been modeled using appropriate primitives, such as polygons, lines, points, etc.
- appropriate primitives such as polygons, lines, points, etc.
- UDM Universal Transverse Mercator
- the model is usually in the form of a list of " line segments (in any coordinate system) that comprise the roads in the zone.
- the list may be converted into an image in the spatial domain (a pixel image) using any of the known or hereinafter developed rendering processes so long as it incorporates certain techniques for determining the weight (e.g., apparent or real thickness) of a given primitive in the pixel (spatial) domain.
- the rendering processes should incorporate certain techniques for determining the weight of the lines that model the roads of the roadmap in the spatial domain. These techniques will be discussed below.
- the elements of the object are classified.
- the classification may take the form of recognizing already existing categories, namely, Al, A2, A3, A , and A5. Indeed, these road elements have varying degrees of coarseness and, as will be discussed below, may be rendered differently based on this classification.
- mathematical scaling is applied to the different road elements based on the zoom level. As will be discussed in more detail below, the mathematical scaling may also vary based on the element classification.
- the pre-set pixel width approach dictates that every road is a certain pixel width, such as one pixel in width on the display.
- Major roads such as highways, may be emphasized by making them two pixels 'wide, etc.
- this approach makes the visual density of the map change as one zooms in and out. At some level of zoom, the result might be pleasing, e.g., at a small-size county level. As one zooms in, however, roads would not thicken, making the map look overly sparse. Further, as one zooms out, roads would run into each other, rapidly forming a solid nest in which individual roads would be indistinguishable.
- the images are produced in such a way that at least some image elements are scaled up and/or down either (i) physically proportional to the zoom level; or (ii) non-physically proportional to the zoom level, depending on parameters that will be discussed in more detail below.
- scaling being "physically proportional to the zoom level" means that the number of pixels representing the road width increases or decreases with the zoom level as the size of an element would appear to change with its distance from the human eye.
- a may be set to a power law other than -1, and d' may be set to a physical linear size other than the actual physical linear size d.
- p may represent the displayed width of a road in pixels and d' may represent an imputed width in physical units
- non-physically proportional to the zoom level means that the road width in display pixels increases or decreases with the zoom level in a way other than being physically proportional to the zoom level, i.e. a ⁇ -1.
- the scaling is distorted in a way that achieves certain desirable results.
- linear size means one-dimensional size.
- the linear sizes of the elements of an object may involve length, width, radius, diameter, and/or any other measurement that one can read off with a ruler on the Euclidean plane.
- the thickness of a line, the length of a line, the diameter of a circle or disc, the length of one side of a polygon, and the distance between two points are all examples of linear sizes.
- the "linear size" in two dimensions is the distance between two identified points of an object on a 2D Euclidean plane.
- Any power law a ⁇ 0 will cause the rendered size of an element to decrease as one zooms out, and increase as one zooms in. When a ⁇ -1, the rendered size of the element will decrease faster than it would with proportional physical scaling as one zooms out. Conversely, when -1 ⁇ a ⁇ 0, the size of the rendered element decreases more slowly than it would with proportional physical scaling as one zooms out.
- p(z) for a given length of a given object, is permitted to be substantially continuous so that during navigation the user does not experience a sudden jump or discontinuity in the size of an element of the image (as opposed to the conventional approaches that permit the most extreme discontinuity - a sudden appearance or disappearance of an element during navigation) .
- p(z) monotonically decrease with zooming out such that zooming out causes the elements of the object become smaller (e.g., roads to become thinner), and such that zooming in causes the elements of the object become larger. This gives the user a sense of physicality about the object (s) of the image.
- the scaling of the road widths may be physically proportional to the zoom level when zoomed in (e.g., up to about 0.5 meters/pixel); (ii) that the scaling of the road widths may be non-physically proportional to the zoom level when zoomed out (e.g., above about 0.5 meters/pixel); and (iii) that the scaling of the road widths may be physically proportional to the zoom level when zoomed further out (e . g . , above about 50 meters/pixel or higher depending on parameters which will be discussed in more detail below) .
- a -1.
- zO 0.5 meters/pixel, or 2 pixels/meter, which when expressed as a map scale on a 15 inch display (with 1600x1200 pixel resolution) corresponds to a scale of about 1:2600.
- d 16 meters, which is a reasonable real physical w-idth for Al roads, the rendered road will appear to be its actual si_ze when one is zoomed in (0.5 meters/pixel or less).
- the rendered line is about ISO pixels wide.
- the rendered line is 32 pixels wide .
- -1 ⁇ a ⁇ the "width of the rendered road decreases more slowly than it would with proportional physical scaling as one zooms out.
- this permits the Al road to remain visible (and distinguishable from other smaller roads) as one zooms out. For example, as shown in FIG.
- the width- of the rendered line using physical scaling would have been about 0.005 pixels at a zoom level of about 3300 meters/pixel, rendering it virtually invisible.
- the width of the rendered line is about 0.8 pixels at a zoom level of 3300 mete s/pixel, rendering it clearly visible.
- the value for zl is chosen to be the most zoomed-out scale at which a given road still has "greater than physical" importance.
- the resolution would be approximately 3300 meters/pixel or 3.3 kilometers/pixel. If one looks at the entire world, then there may be no reason for U.S. highways to assume enhanced importance relative to the view of the country alone .
- the scaling of the road widths is again physically proportional to the zoom level, but preferably with a large d' (much greater than the real width d) for continuity of p(z) .
- zl and the new value for d' are preferably chosen in such a way that, at the outer scale zl, the rendered width of the line will be a reasonable number of pixels.
- Al roads may be about H pixel wide, which is thin but still clearly visible; this corresponds to an imputed physical road width of 1650 meters, or 1.65 kilometers.
- p(z) has six parameters: zO, zl, dO, dl, d2 and a.
- zO and zl mark the scales at which the behavior of p(z) changes.
- zooming is physical (i.e., the exponent of z is -1), with a physical width of dO, which preferably corresponds to the real physical width d.
- zooming is again physical, but with a physical width of dl, which in general does not correspond to d.
- the rendered line width scales with a power law of a, which can be a value other than -1.
- a can be a value other than -1.
- p(z) continuous
- specifying zO, zl, dO and d2 is sufficient to uniquely determine dl and a, which is clearly shown in FIG. 14.
- the approach discussed above with respect to Al roads may be applied to the other road elements of the roadmap object.
- An example of applying these scaling techniques to the Al, A2, A3, A4, and A5 roads is illustrated in the log-log graph of FIG. 15.
- zO 0.5 meters/pixel for all roads, although it may vary from element to element depending on the context.
- the dotted lines all have a slope of -1 and represent physical scaling at different physical widths. From the top down, the corresponding physical widths of these dotted lines are: 1.65 kilometers, 312 meters, 100 meters, 20 meters, 16 meters, 12 meters, 8 meters, 5 meters, and 2.5 meters.
- drawing the antialiased vertical lines of FIGS. 16A-D could also be accomplished by alpha-blending two images, one (image A) in which the line is 1 pixel wide, and the other (image B) in which the line is 3 pixels wide.
- Alpha blending assigns to each pixel on the display (1-alpha) * (corresponding pixel in image A) + alpha* (corresponding pixel in image B) .
- alpha is varied between zero and one, the effective width of the rendered line varies smoothly between one and three pixels.
- This alpha-blending approach only produces good visual results in the most general case if the difference between the two rendered line widths in images A and B is one pixel or less; otherwise, lines may appear haloed at intermediate widths .
- This same approach can be applied to rendering points, polygons, and many other primitive graphical elements at different linear sizes.
- the 1.5 pixel-wide line (FIG. 16B) and the 2 pixel-wide line (FIG. 16C) can be constructed by alpha-blending between the 1 pixel wide line (FIG. 16A) and the 3 pixel wide line (FIG. 16D) .
- a 1 pixel wide line (FIG. 17A) , a 2 pixel wide line (FIG. 17B) and a 3 pixel wide line (FIG. 17C) are illustrated in an arbitrary orientation.
- the same principle applies to the arbitrary orientation of FIGS. 17A-C as to the case where the lines are aligned exactly to the pixel grid, although the spacing of the line widths between which to alpha-blend may need to be finer than two pixels for good results.
- FIG. 18 is substantially similar to FIG. 14 except that FIG. 18 includes a set of horizontal lines and vertical lines.
- the horizontal lines indicate line widths between 1 and 10 pixels, in increments of one pixel.
- the vertical lines are spaced such that line width over the interval between two adjacent vertical lines changes by no more than two pixels.
- the vertical lines represent a set of zoom values suitable for pre- rendition, wherein alpha-blending between two adjacent such prerendered images will produce characteristics nearly equivalent to rendering the lines representing roads at continuously variable widths .
- This tiling technique may be employed for resolving an image at a particular zoom level, even if that level does not coincide with a pre-rendered image. If each image in the somewhat larger set of resolutions is pre-rendered at the appropriate resolution and tiled, then the result is a complete system for zooming and panning navigation through a roadmap of arbitrary complexity, such that all lines appear to vary in width continuously in accordance with the scaling equations disclosed herein .
- the user enjoys the appearance of smooth and continuous navigation through the various zoom levels. Further, little or no detail abruptly appears or disappears when zooming from one level to another. This represents a significant advancement over the state of the art.
- the various aspects of the present invention may be applied in numerous products, such as interactive software applications over the Internet, automobile-based software applications and the like.
- the present invention may be employed by an Internet website that provides maps and driving directions to client terminals in response to user requests.
- various aspects of the invention may be employed in a GPS navigation system in an automobile.
- the invention may also be incorporated into medical imaging equipment, whereby detailed information concerning, for example, a patient's circulatory system, nervous system, etc. may be rendered and navigated as discussed hereinabove.
- the applications of the invention are too numerous to list in their entirety, yet a skilled artisan will recognize that they are contemplated herein and fall within the scope of the invention as claimed.
- the present invention may also be utilized in connection with other applications in which the rendered images provide a means for advertising and otherwise advancing commerce. Additional details concerning these aspects and uses of the present invention may be found in U.S. Provisional Patent Application No.
- FIG. 1 PRIOR ART
- FIG. 2 PRIOR ART
- FIG. 3 PRIOR ART
- FIG. 4 PRIOR ART
- FIG. 17 METERS/PIXEL
- FIG. 18 METHOD FOR SPATIALLY ENCODING LAROE TEXTS, METADATA, AND OTHER COHERENTLY ACCESSED NON-IMAGE DATA
- image compression standards such as JPEG2000/JPIP 1 have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selecti ely from a server to a client over a low-bandwidth communication channel.
- very large images i.e. gigapixels in size
- the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low-bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate.
- the present invention relates to an extension of these selectively decompressable image compression and transmission technologies to text ⁇ .al or other non-image data.
- a large text e.g. the book Ulysses, by James Joyce.
- We can format this text by putting each chapter in its own column, with columns for sequential chapters arranged left-to-right. Columns are assumed to have a maximum width in characters, e.g. 100.
- Figure 2 shows the entire text of Ulysses encoded as an image in this fashion, with each textual character corresponding to a single pixel.
- the pixel intensity value in Figure 1 is simply the ASCII code of the corresponding character.
- JPEG2000 is used as a lossy compression format, meaning that the decoded image bytes are not necessarily identical to the original bytes. Clearly if the image bytes represent text, lossy compression is not acceptable.
- One of the design goals of JPEG2000 was, however, to support lossless compression efficiently, as this is important in certain sectors of the imaging community (e.g. medical and scientific). Lossless compression ratios for photographic images are typically only around 2:1, as compared with visually acceptable lossy images, which can usually easily be compressed by 24:1.
- Image compression both lossy and lossless, can operate best on images that have good spatial continuity, meaning that the differences between the intensity values of adjacent pixels are minimized.
- the raw ASCII encoding is clearly not optimal from this perspective.
- One very simple way to improve the encoding is to reorder characters by frequency in the text or simply in the English language, from highest to lowest: code 0 remains empty space, code 1 becomes the space character, and codes 2 onward are e, t, a, o, i, n, s, r, h, 1, etc.
- Figures 2 and 3 compare text-images with ASCII encoding and with this kind of character frequency encoding.
- the file size is 1.6MB, barely larger than the raw ASCII text file (1.5MB) and 37% smaller than the ASCII encoded text-image.
- the compressed file size can drop well below the ASCII text file size.
- the further optimizations can include, but are not limited to: using letter transition probabilities (Markov-1) to develop the encoding, instead of just frequencies (Markov-0) encoding as pixels the delta or difference between one character and the next, rather than the characters themselves.
- image compression standards such as JPEG2000/JPIP 1 have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selectively from a server to a client over a low- bandwidth communication channel.
- images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low- bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate.
- the present invention relates to an extension of these selectively decompressable image compression and transmission technologies to geospatial or schematic data. It combines and extends methods introduced in previous application (1) Method for spatially encoding large texts, metadata, and other coherently accessed non-image data, attached as exhibit A, and (2) METHODS AND APPARATUS FOR NAVIGATING AN IMAGE attached as exhibit B.
- (2) the concept of continuous multiscale roadmap rendering was introduced.
- the basis for the invention of (2) is a pre-rendered "stack" of images of a roadmap or other vector-based diagram at different resolutions, in which categories of visual elements (e.g. classes of road, including national highway, state highway, and local road) are rendered with different visual weights at different resolutions.
- a user on the client side can navigate through a large map (e.g. all roads in the United States), zooming and panning continuously, without experiencing any visual discontinuities, such as
- a maximum reasonable resolution for these most detailed pre-rendered images may be about 15 meters/pixel; however, it is desirable from the user's standpoint to be able to zoom in farther.
- Pre- rendering at higher detail is not desirable for several reasons: first, because the file sizes on the server side become prohibitive (a single Universal Transverse Mercator zone image at 15 meters/pixel may already be in the gigapixel range); second, because a pre-rendered image is an inefficient representation for the kind of veiy sparse black-and-white data normally associated with high-resolution map rendering; and third, because the client may require the "real" vector data for performing computational tasks beyond static visual presentation.
- a route guidance system may highlight a road or change its color; this can be done on the client side only if the client has access to vector data, as opposed to a pre-rendered image alone.
- Vector data may also include street names, addresses, and other information which the client must have the flexibility to lay out and render selectively. Pre-rendering street name labels into the map image stack is clearly undesirable, as these labels must be drawn in different places and sizes depending on the precise location and scale of the client view; different label renditions should not blend into one another as the user zooms. Pre- rendering such data would also eliminate any flexibility with regard to font.
- vector data (where we use the term generically to refer both to geometric and other information, such as place names) is both important to the client in its own right, and a more efficient representation of the information than pre-rendered imagery, when the desired rendering resolution is high. Note, however, that if a large area is to be rendered at low resolution, the vector data may become prohibitively large and complex, making the pre-rendered image the more efficient representation. Even at low resolution, however, some subset of the vector data is necessary, such as the names of major highways.
- the present invention extends the methods introduced in (1) to allow spatial vector data to be encoded and transmitted selectively and incrementally to the client, possibly in conjunction with the pre-rendered imagery of (2).
- the database would need to include all relevant vector data, indexed spatially.
- Such databases present many implementation challenges.
- three images or channels are used for representing the map data, each with 8 bit depth: the prerendered layer is a precomputed literal rendition of the roadmap, as per (2); the pointer layer consists of 2*2 pixel blocks positioned at or very near the roadmap features to which they refer, typically intersections; the data layer consists of n*m pixel blocks centered on or positioned near the 2*2 pointers which refer to them.
- Figures 2-3 show the prerendered layer alone, for comparison and orientation. The region shown is King County, in Washington state, which includes Seattle and many of its suburbs.
- Figures 3a and 3b are closeups from suburban and urban areas of the map, respectively.
- Figure 3 a Closeup of suburban area of King County.
- Figure 3b Closeup of urban area of King County. If the user navigates to the view of the map shown in Figure 3 a, then the client will request from the server the relevant portions of all three image layers, as shown.
- the prerendered layer (shown in yellow) is the only one of the three displayed on the screen as is. The other two specify the vector data.
- the pointer image consists of 2x2 pixel blocks aligned on a 2x2 pixel grid, each of which specifies an (x,y) vector offset (with the x and y components of the vector each comprising a 16-bit integer, hence two pixels each) from its own location to the beginning (top left corner) of a corresponding data block in the data layer.
- the corresponding data block begins with two 16-bit values (four pixels) specifying the data block width and height.
- the width is specified first, and is constrained to be at least 2, hence avoiding ambiguities in reading the width and height.
- the remainder of the data block can be treated as binary data which may contain any combination of vectors, text, or other information.
- data blocks contain streetmap information including street names, address ranges, and vector representations.
- the pointer and data layers are precomputed, just as the prerendered layer is. Precomputation for the pointer and data layers consists of encoding all of the relevant vector data into data blocks, and packing both the pointers and data blocks as efficiently as possible into their respective images.
- Efficient rectangle packing is a computationally difficult problem; however, there are numerous approximate algorithms for solving it in the computational geometry literature, and the present invention does not stipulate any particular one of these.
- the "greedy algorithm” used to insert a new rectangle as close as possible to a target point then proceeds as follows: Attempt to insert the rectangle centered on the target point.
- Figure 4 demonstrates the output of the basic packing algorithm for three cases. In each case, the algorithm sequentially placed a number of rectangles as near as possible to a common point. This solution to the rectangle packing problem is provided by way of example only.
- Figure 4 Test output of the greedy rectangle packing algorithm. On the left, predominantly small, skinny rectangles; in the center, large, square rectangles; and on the right, a mixture.
- pointer/data block pairs are thus inserted in random order.
- Other orderings may further improve packing efficiency in certain circumstances; for example, inserting large blocks before small ones may minimize wasted space.
- Pointers are always 2x2 (our notation is rows x columns); however, for data blocks, there is freedom in selecting an aspect ratio: the required block area in square pixels is determined by the amount of data which must fit in the block, but this area can fit into rectangles of many different shapes.
- a 24 byte data block (including 4 bytes of width and height information, and 20 bytes of arbitrary data) can be represented exactly as 1x24, 2x12, 3x8, 4x6, 6x4, 8x3, or 12x2. (24x1 is disqualified, as the block width must be at least 2 for the 2-byte width to be decoded before the block dimensions are known on the client side, as described above.)
- the block can also be represented, with one byte left over, as 5x5.
- 5x5 We refer to the set of all factorizations listed above, in addition to the approximate factorization 5x5, as "ceiling factorizations".
- block dimensions may be selected based only on a ceiling factorization of the data length; in general, "squarer" blocks (such as 4x6) pack better than oblique ones (such as 2x12).
- the simplest data block sizing algorithm would thus select either 4x6 or 5x5, depending on how it trades off "squareness" against wasted bytes. More sophisticated block size selection algorithms may pick block dimensions adaptively, as part of the search for empty space near the target point.
- steps 1 and 4 of the algorithm above are then modified as follows: Sort the ceiling factorizations of the required data length by desirability, with preference for squarer factorizations and possibly a penalty for wasted bytes. Attempt to place rectangles of dimensions given by each ceiling factorization in turn at target point p. If any of these insertions succeeds, algorithm ends.
- Each of the three map layers prerendered roads, pointers and data — is stored as a JPEG2000 or similar spatially-accessible representation. However, the storage requirements for the three layers differ.
- the prerendered road layer need not be lossless; it is only necessary for it to have reasonable perceptual accuracy when displayed. At 15m/pixel, we have found 0.5 bit/pixel lossy wavelet compression to be fully adequate.
- the left eight columns represent the first pixel of the pair, previously the high-order byte; the rightmost eight columns represent the second pixel, previously the low-order byte.
- the range of accessible values (0-65535) remains unchanged, but the two bytes become much more symmetric.
- the two bytes each assume values ⁇ 16.
- Similar techniques apply to 32-bit or larger integer values.
- These techniques are also extensible to signed quantities. For variables in which the sign changes frequently, as occurs for differential coding of a road vector, a sign bit can be assigned to position 0, and the absolute value encoded in alternating bytes as above. Note that to be drawn convincingly, road vector data must often be represented at greater than pixel precision.
- Arbitrary units smaller than a pixel can instead be used, or equivalent!/, subpixel precision can be implemented using fixed point in combination with the above techniques.
- 4 subpixel bits are used, for 1/16 pixel precision.
- the JPEG2000 representation (including lossy pre-rendered roadmap image, lossless pointer layer, and lossless data layer) is actually smaller than the compressed ZIP file representing the original data as tabulated text. (This file is part of the United States Census Bureau's 2002 TIGER/Line database.) Unlike the original ZIP, however, the new representation is ready to serve interactively to a client, with efficient support for continuously pannable and zoomable spatial access.
- the original prerendered multiscale map invention introduced in [2] included not a single prerendered image, but a stack of such images, rendered at progressively coarser resolutions and with rescaled weights for lines (or other visible features).
- the present invention can be extended to include pointer and data images corresponding to the coarser prerendered roadmap images, in which only a subset of the original vector objects are represented.
- pointer and data images which are at much lower resolution than those of Figures 1-3, might only include data for state and national highways, excluding all local roads.
- coarser data may also be "abstracts", for example specifying only road names, not vectors.
- Images at different resolutions might include varying mixtures or subsets of the original data, or abstracted versions. This technique both allows all of the relevant data to fit into the smaller coarse images, and provides the client with the subset of the vector information relevant for navigation at that scale.
- the implementation outlined above suggests an 8-bit greyscale prerendered map image at every resolution, the prerendered images may also be in color. Further, the prerendered images may be displayed by the client in color even if they are single-channel images, since the vector data can be used to draw important roads in different colors than the prerendered material. Finally, the prerendered images may omit certain features or roads present in the vectorial data, relying on the client to composite the image and vectorial material appropriately.
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- Automation & Control Theory (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Mathematical Physics (AREA)
- Computer Graphics (AREA)
- Artificial Intelligence (AREA)
- Processing Or Creating Images (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
- Instructional Devices (AREA)
Abstract
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/803,010 US7133054B2 (en) | 2004-03-17 | 2004-03-17 | Methods and apparatus for navigating an image |
US10/854,117 US7042455B2 (en) | 2003-05-30 | 2004-05-26 | System and method for multiple node display |
US61748504P | 2004-10-08 | 2004-10-08 | |
US62286704P | 2004-10-28 | 2004-10-28 | |
PCT/US2005/008924 WO2005089434A2 (fr) | 2004-03-17 | 2005-03-17 | Procede de codage et de service de donnees geospatiales ou autres donnees vectorielles sous forme d'images |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1756521A2 true EP1756521A2 (fr) | 2007-02-28 |
Family
ID=34994346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05725818A Withdrawn EP1756521A2 (fr) | 2004-03-17 | 2005-03-17 | Procede de codage et de service de donnees geospatiales ou autres donnees vectorielles sous forme d'images |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1756521A2 (fr) |
JP (1) | JP2007529786A (fr) |
CA (1) | CA2559678C (fr) |
WO (1) | WO2005089434A2 (fr) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5354924B2 (ja) * | 2007-02-16 | 2013-11-27 | 国立大学法人 名古屋工業大学 | デジタルマップ作成システム |
US8203552B2 (en) | 2007-08-30 | 2012-06-19 | Harris Corporation | Geospatial data system for selectively retrieving and displaying geospatial texture data in successive additive layers of resolution and related methods |
GB2457646B (en) * | 2007-10-30 | 2010-03-03 | Imagination Tech Ltd | Method and apparatus for compressing and decompressing data |
JP5561915B2 (ja) * | 2008-06-06 | 2014-07-30 | 三菱電機株式会社 | 地図描画装置及びプログラム |
US8935292B2 (en) | 2008-10-15 | 2015-01-13 | Nokia Corporation | Method and apparatus for providing a media object |
US9218682B2 (en) * | 2008-10-15 | 2015-12-22 | Nokia Technologies Oy | Method and apparatus for generating an image |
CN102869952B (zh) * | 2010-04-08 | 2015-07-29 | 司法技术Wai公司 | 生成包含工具痕的物体的修改的3d图像 |
US9070225B2 (en) * | 2011-10-03 | 2015-06-30 | Oracle International Corporation | Interactive display elements in a visualization component |
EP2804151B1 (fr) * | 2013-05-16 | 2020-01-08 | Hexagon Technology Center GmbH | Procédé de rendu des données d'une surface tridimensionnelle |
KR101866363B1 (ko) * | 2017-11-24 | 2018-06-12 | 공간정보기술 주식회사 | 사용자 기반 조건에 따른 3차원(3d) 모델링 생성과 제공 시스템 |
KR102030594B1 (ko) * | 2018-01-31 | 2019-11-08 | 가이아쓰리디 주식회사 | 3차원 지리 정보 시스템 웹 서비스를 제공하는 방법 |
CN108732931B (zh) * | 2018-05-17 | 2021-03-26 | 北京化工大学 | 一种基于jit-rvm的多模态间歇过程建模方法 |
WO2020061336A1 (fr) | 2018-09-20 | 2020-03-26 | Paper Crane, LLC | Analyse automatisée de données géospatiales |
KR102081451B1 (ko) * | 2019-02-15 | 2020-02-25 | 김성용 | 근거리 네트워크와 연관된 액세스 포인트를 이용하여 컨텐츠를 공유하는 방법, 상기 방법을 사용하는 액세스 포인트 장치, 상기 액세스 포인트와 연동하는 클라이언트에 의해 컨텐츠를 업로드하는 방법, 및 상기 클라이언트에 의해 컨텐츠를 수신하는 방법, 상기 방법을 사용하는 클라이언트 장치 |
EP3944198A1 (fr) * | 2020-07-21 | 2022-01-26 | Transport For London | Zoom et mise à l'échelle |
CN112066997A (zh) * | 2020-08-25 | 2020-12-11 | 海南太美航空股份有限公司 | 高清航线地图的导出方法及系统 |
US20220205808A1 (en) * | 2020-12-25 | 2022-06-30 | Mapsted Corp. | Localization using tessellated grids |
CN113721802B (zh) * | 2021-08-18 | 2024-09-27 | 广州南方卫星导航仪器有限公司 | 一种矢量捕捉方法 |
CN117115241B (zh) * | 2023-09-06 | 2024-03-29 | 北京透彻未来科技有限公司 | 一种追寻数字病理图像在缩放过程中中心焦点的方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0384416A (ja) * | 1989-08-28 | 1991-04-10 | Matsushita Electric Ind Co Ltd | 車載用地図表示装置 |
JPH0394376A (ja) * | 1989-09-07 | 1991-04-19 | Canon Inc | データベース検索方法 |
JPH04160479A (ja) * | 1990-10-23 | 1992-06-03 | Hokkaido Nippon Denki Software Kk | 文字列表示位置設定方式 |
JP2956587B2 (ja) * | 1996-06-10 | 1999-10-04 | 凸版印刷株式会社 | 広告情報の登録方法および供給方法 |
JPH1089976A (ja) * | 1996-09-13 | 1998-04-10 | Hitachi Ltd | 情報表示装置およびナビゲーションシステム |
JP3342836B2 (ja) * | 1998-07-31 | 2002-11-11 | 松下電器産業株式会社 | 地図表示装置 |
JP2000337895A (ja) * | 1999-05-24 | 2000-12-08 | Matsushita Electric Ind Co Ltd | 車載用地図表示装置 |
JP4209545B2 (ja) * | 1999-05-24 | 2009-01-14 | クラリオン株式会社 | ランドマーク表示方法及びナビゲーション装置 |
US7375728B2 (en) * | 2001-10-01 | 2008-05-20 | University Of Minnesota | Virtual mirror |
US6909965B1 (en) * | 2001-12-28 | 2005-06-21 | Garmin Ltd. | System and method for creating and organizing node records for a cartographic data map |
JP4224985B2 (ja) * | 2002-05-22 | 2009-02-18 | ソニー株式会社 | 地図描画装置、地図表示システム、およびプログラム |
-
2005
- 2005-03-17 CA CA2559678A patent/CA2559678C/fr not_active Expired - Fee Related
- 2005-03-17 JP JP2007504113A patent/JP2007529786A/ja active Pending
- 2005-03-17 EP EP05725818A patent/EP1756521A2/fr not_active Withdrawn
- 2005-03-17 WO PCT/US2005/008924 patent/WO2005089434A2/fr not_active Application Discontinuation
Non-Patent Citations (1)
Title |
---|
See references of WO2005089434A2 * |
Also Published As
Publication number | Publication date |
---|---|
JP2007529786A (ja) | 2007-10-25 |
CA2559678A1 (fr) | 2005-09-29 |
WO2005089434A2 (fr) | 2005-09-29 |
CA2559678C (fr) | 2013-12-17 |
WO2005089434A3 (fr) | 2006-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005089434A2 (fr) | Procede de codage et de service de donnees geospatiales ou autres donnees vectorielles sous forme d'images | |
JP4831071B2 (ja) | イメージ・データの通信および/または格納を管理するシステムおよび方法 | |
CA2812008C (fr) | Procedes et appareil de navigation d'une image | |
AU2006230233B2 (en) | System and method for transferring web page data | |
US7023456B2 (en) | Method of handling context during scaling with a display | |
CA2533279C (fr) | Systeme et methode pour traiter des donnees cartographiques | |
US7724965B2 (en) | Method for encoding and serving geospatial or other vector data as images | |
US7554543B2 (en) | System and method for exact rendering in a zooming user interface | |
Potmesil | Maps alive: viewing geospatial information on the WWW | |
US20030151626A1 (en) | Fast rendering of pyramid lens distorted raster images | |
CN101501664A (zh) | 用于传送网页数据的系统和方法 | |
Möser et al. | Context aware terrain visualization for wayfinding and navigation | |
JP4861978B2 (ja) | イメージをナビゲートするための方法および装置 | |
JP2008535098A (ja) | ウェブページデータを転送するシステムおよび方法 | |
Huang et al. | Visualizing massive terrain with transportation infrastructure by using continuous level of detail | |
Solomon | The chipmap™: Visualizing large VLSI physical design datasets | |
Triantafyllos et al. | A VRML Terrain Visualization Approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060907 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR LV MK YU |
|
DAX | Request for extension of the european patent (deleted) | ||
APBK | Appeal reference recorded |
Free format text: ORIGINAL CODE: EPIDOSNREFNE |
|
APBN | Date of receipt of notice of appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNNOA2E |
|
APBR | Date of receipt of statement of grounds of appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNNOA3E |
|
APAF | Appeal reference modified |
Free format text: ORIGINAL CODE: EPIDOSCREFNE |
|
APBX | Invitation to file observations in appeal sent |
Free format text: ORIGINAL CODE: EPIDOSNOBA2E |
|
APBZ | Receipt of observations in appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNOBA4E |
|
APBT | Appeal procedure closed |
Free format text: ORIGINAL CODE: EPIDOSNNOA9E |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20071002 |