US20070083808A1 - System and method for measuring SVG document similarity - Google Patents
System and method for measuring SVG document similarity Download PDFInfo
- Publication number
- US20070083808A1 US20070083808A1 US11/245,859 US24585905A US2007083808A1 US 20070083808 A1 US20070083808 A1 US 20070083808A1 US 24585905 A US24585905 A US 24585905A US 2007083808 A1 US2007083808 A1 US 2007083808A1
- Authority
- US
- United States
- Prior art keywords
- svg
- logically
- minimized
- dom tree
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/70—Media network packetisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
- G06F16/835—Query processing
- G06F16/8373—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
- H04L65/611—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for multicast or broadcast
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
Definitions
- the present invention relates generally systems for Scalable Vector Graphics (SVG) documents. More particularly, the present invention relates to systems and methods for measuring the degree of similarity in SVG documents.
- SVG Scalable Vector Graphics
- Measuring a level or degree of similarity between objects can be very useful in many applications.
- Image similarity is performed based upon the images' inherent color and texture properties.
- 3D objects can be compared based on shape matching algorithms that consider topology and feature matching. Textual content can be matched by metrics ranging from a simple “diff” program to more advanced pattern matching and semantic grouping algorithms.
- FIGS. 1 ( a ) and 1 ( b ) exemplify this issue.
- FIGS. 1 ( a ) and 1 ( b ) look identical, their underlying textual representations are quite different from one another.
- FIG. 1 ( a ) and 1 ( b ) look identical, their underlying textual representations are quite different from one another. In particular, FIG.
- FIG. 1 ( a ) makes use of ⁇ defs> and ⁇ use> elements for predefining the shapes and reusing them with different colors and positions.
- FIG. 1 ( b ) renders each shape separately without reusability. If a system relies on traditional document comparison methods to determine the similarity between these documents, the documents might be classified as being vastly different. In addition, traditional pixel-based methods for determining levels of similarity are not optimal, as one would have to convert the SVG-rendered content into raster graphics, and the process becomes even more complicated when animations are involved.
- XML Diff detects structural changes in the XML sub-trees and produces a Diffgram to describe the differences between the two sub-trees.
- a second method involves the use of a matching algorithm for measuring the structural similarity between an XML document and a DTD.
- a third approach involves linearizing the structure of each XML document by representing it as a numerical sequence, and then comparing the sequences through the analysis of their frequencies.
- a fourth approach involves a structural similarity metric for XML documents based upon an “XML aware” edit distance between ordered labeled trees to cluster documents by DTD.
- a fifth method measures similarity between vectors after representing documents based upon their structure in vector form.
- XMLill incorporates and combines existing compressors in order to apply them to heterogeneous XML data.
- XMill uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, as well as user-defined compressors for application specific data.
- the present invention provides an improved process of determining whether any two given SVG documents are visually similar, as well as determining the degree of similarity and the nature of similarity between the documents.
- the present invention introduces a mechanism for converting individual SVG documents into their minimal logical representations based upon certain SVG optimization heuristics. The similarity of these logical representations is then computed on the representations' reduced logical DOM trees. Tree isomorphism is useful to determine how similar tree structures are with respect to one another. Several metrics, such as the maximum common sub-tree and tree distance measurement, can be used to determine the maximum number of common nodes between two given trees. The larger the maximum common sub-tree is, the more similar are the two trees.
- the approach of the present invention for reducing SVG to its minimal logical form is different from the XML compression methods discussed above in that the minimal representation is computed in a substantially different manner with the present invention. Additionally, unlike the compression based methods discussed previously, the minimal logical SVG form is still uncompressed and is capable of still being rendered using the system and method of the present invention.
- the present invention is capable of vastly improving and even optimizing SVG's performance, so that the underlying logical representations that comprise large datasets of SVG content can be modified, updated or traversed more efficiently based upon the computed similarity information.
- the present invention is also unique in that it concerns SVG-type content where both the underlying XML representation and its visual appearance need to be considered before actually computing a degree of similarity.
- present invention provides new techniques for computing a minimal logical representation of the SVG document, computing the similarity by applying tree isomorphism on the normalized SVG trees and representing this similarity information.
- the system and method of the present invention can be applied to various use cases particularly relevant to the mobile devices community.
- this similarity information allows the client to be more discrete about whether or not to use the arriving packet of data for reconstructing a scene graph.
- the client may be able to repair its existing scene graph based on information provided in the RTP payload header rather than destroying the entire scene graph and reconstructing a new one.
- similar SVG documents can be retrieved based upon either an example SVG document or by a query of keywords. This facilitates more efficient and intelligent content search applications.
- a compression scenario by observing intra-document similarity, redundant information within a given SVG document can be removed by merely storing the element ID instead.
- FIGS. 1 ( a ) and 1 ( b ) show examples of two SVG documents that appear graphically similar, but the underlying textual representations are vastly different;
- FIG. 2 ( a ) and 2 ( b ) show examples of two SVG documents that appear graphically different, but the underlying textual representations are similar;
- FIG. 3 shows the conversion of the underlying textual representation of a SVG document into a minimal logical representation by ignoring redundant, unreferenced and unused SVG elements
- FIG. 4 demonstrates an example of resolving attributes while computing a minimal logical representation from a SVG document
- FIG. 5 shows an example of ignoring less important information while computing a minimal logical representation
- FIG. 6 shows an example of two SVG documents sharing some similar content between them
- FIG. 7 shows an example of two SVG documents sharing some similar structure between them
- FIG. 8 shows an example of two SVG documents sharing similar content as well as structure between them
- FIG. 9 ( a ) presents a graphical depiction of an original SVG scene
- FIG. 9 ( b ) shows the effect of a Scene Update: SetpdateAttribute action
- FIG. 9 ( c ) shows the effect of a subsequent Scene Update: AddElement action
- FIG. 9 ( d ) shows the effect of a subsequent Scene update: DeleteElement action
- FIG. 9 ( e ) shows the effect of a subsequent Scene update: ReplaceElement action
- FIG. 10 is representation showing the assigning of importance weights to individual pieces of content in a SVG document
- FIG. 11 is a flow chart showing a process for implementing one embodiment of the present invention.
- FIG. 12 is an overview diagram of a system within which the present invention may be implemented.
- FIG. 13 is a perspective view of an electronic device that can incorporate the principles of the present invention.
- FIG. 14 is a schematic representation of the circuitry of the electronic device of FIG. 13 .
- the present invention provides a system and method for reducing SVG documents to their minimal logical DOM tree structure in order to avoid the problem of disparity between the visual and underlying textual representations of each document.
- the present invention then applies similarity metrics based on tree isomorphism to compute the degree of similarity between the SVG documents.
- the present invention can be used in applications such as streaming, content searching, and compression. Each of these use cases is discussed below.
- FIG. 11 A process for computing similarity between the SVG samples according to the present invention is depicted in FIG. 11 .
- a minimal logical representation of two SVG documents is determined.
- tree isomorphism is performed on each SVG's Document Object Model (DOM).
- DOM Document Object Model
- the system applies similarity metrics to the created hierarchical trees for each document.
- a representation of the computed similarity information for the two documents is created. This information can then be used in by one of several applications at step 140 . Each of these process portions is discussed below.
- the determining of a minimal logical representation of a SVG document involves four different sub-processes in one embodiment of the invention.
- the first sub-process involves collapsing the document. Redundant, unreferenced and unused SVG elements and comments are ignored. This sub-process is applied in order to remove elements that do not contribute to the visual representation of the SVG document.
- the document may often contain element definitions within the ⁇ defs> block that are not rendered or may be set to invisible. Such elements can be removed before comparing the document at issue to other SVG documents.
- This collapsing sub-process is depicted in FIG. 3 , where comments and an instruction with a “visibility:hidden” setting are removed.
- a second sub-process involves “substitution.” This sub-process applies to expanding internal references and definitions within the SVG document, resolving references and resolving relative URLs to absolute URLs for the sake of fairer comparison.
- ⁇ defs> and ⁇ use> elements are meant for the reusability of element definitions and can be used as many times as one desires in the SVG document. If there is a ⁇ use> element, the used content gets cloned and virtually appended to the ⁇ use> element. This cloned content is a “shadow tree” of the original element that gets “ ⁇ use> ed.”
- Substitution involves expanding the element definition inline where a ⁇ use> element is mentioned. This can be done in order to accurately compare the document with another visually similar SVG document that does not use ⁇ defs> and ⁇ use> elements, for example.
- the third sub-process in creating a minimal logical representation involves resolving attributes. Often, even if references are properly substituted from the ⁇ defs> block and the same element id is used with ‘xlink:href’, the attributes in the documents might be different. This can lead to significant variations in the visual representation. For example, in SVG animations, changes in height, width, begin time, etc. may lead to completely different visuals at a particular timestamp. One such example is depicted in FIG. 4 . In this scenario, even though both documents use the same animate definition, the attributes in the actual animation are different, as indicated in the corresponding visual representations at a two second mark. As the first animation starts one second earlier than the second animation, the rectangle has already changed color from purple to yellow by the two-second mark.
- the fourth sub-process in creating a minimal logical representation involves ignoring less significant differences.
- Individual elements or attributes in SVG often may not be as important as other elements or attributes. For example, differences in font styles and size may be ignored when computing similarity for use in certain applications.
- SVG embeds other media such as raster images, audio and video. These elements would require audio/video similarity metrics to be applied separately to them in addition to computing SVG similarity, or they can simply be ignored depending upon the particular application or system.
- the two SVG documents are visually similar, except that the word ‘Hello’ is larger in the first image when compared to the second image. Looking at the textual representations, one can see that the font sizes vary. Depending upon the application, this may not be very important and, when creating the minimal logical representation, the ‘font-size’ attribute can be ignored if the application so permits.
- Step 110 in FIG. 11 involves Tree Isomorphism on each SVG's Document Object Model (DOM).
- SVG Document Object Model
- the elements are rendered in the order in which they appear in the scene graph or Document Object Model (DOM).
- DOM Document Object Model
- Each element in the data format can be thought of as a canvas on which paint is applied. If objects are grouped together with a ⁇ g> tag, they are first rendered as a separate group canvas, then composited on the main canvas using filters or alpha masks associated with the group.
- the underlying SVG definitions are laid out in an XML-like node-labeled tree structure.
- the DOM can be viewed as a directed acyclic tree, allowing traversal of the nodes of the tree that constitute the various SVG elements.
- Step 120 in FIG. 11 involves applying similarity metrics.
- the present invention can use any of the following metrics while still providing superior performance.
- a and B are two logically minimized SVG DOM trees
- the set operation ⁇ (intersection) can be defined. This operation denotes a SVG sub-tree common to both trees A and B.
- ) of a SVG DOM tree is the number of nodes (SVG elements) in a tree T.
- * 100 S 2
- a and B are two logically minimized SVG DOM trees, D(A, B) is their tree edit distance (i.e., the number of nodes that are different) and Cost(A,B) is the cost to delete the different nodes in A and insert the new nodes from B.
- S(A,B) is low when the SVG documents are similar with a high percentage of matching nodes and high with low percentage of matching nodes when they are very different. In this scenario, the value ranges between 0 and 1.
- the similarity between any two minimal logical SVG scene graphs can be characterized based on three categories: Content similarity, structural similarity, and positional content similarity.
- Content similarity involves determining the common content intersection between two logical SVG scene graphs.
- the equality of contents in two given SVG scene graphs is analogous to sub-string matching.
- the two SVG scene graphs illustrated in FIG. 6 share some common road sign symbol definitions.
- a fragment of the common SVG content shared between the two SVG documents is provided below the scene graphs.
- Structural similarity involves determining the common structural intersection between two logical SVG scene graphs.
- Structurally similar SVG scene graphs have the similar tree hierarchies, i.e. they share some common elements and children of elements. However, the attributes of the common elements might be different.
- the documents have some structural similarity, but the icons in each of the documents have different attributes and IDs. For example, the first SVG document has icon elements define weather symbols, and the second SVG document has the icon elements define road signs. Additionally, the fill color attributes for various elements in the two files are different. However, the two documents do have a similar underlying map structure with road connectors and cities.
- Positional content similarity involves the consideration of content as well as position. Positional content similarity is a measure of the common content positioned in the common structure and is most suitable for determining SVG packet similarity while streaming, as it provides the most information. The higher the percentage of positional content similarity between two SVG scene graphs, the more identical the SVG scene graphs are to each other. In FIG. 8 , both SVG documents/scene graphs are positionally similar with the map layout in consideration. However, SVG Document B has a zooming capability on one region of the map.
- Step 130 in FIG. 11 involves the creation of a representation of the computed similarity information. While the similarity metrics inform the user the type of similarity and how similar the SVG scene graphs are to one another, the similarity information tells the user what is similar. For example, in a streaming scenario, if the positional content similarity between two SVG samples is >80%, then the streaming server can choose to transmit the similarity information to the client. The client can then modify the scene graph based upon this incoming similarity information.
- SVG is XML-based, and the similarity information between two SVG scene graphs can be intuitively represented using node positions in the SVG scene graph.
- the syntax grammar that can be used to specify node positions for SVG similarity information is defined as follows:
- Step 140 in FIG. 11 involves having various applications use the generated similarity information. Three such applications are discussed herein. However, it is also possible for other applications to be capable of using the generated similarity information of the present invention.
- SVG content has the capability of providing a framework for audio and video content and can be streamed across a network to many clients at a given time.
- one of several error correction mechanisms includes retransmission of the same sample at specific time intervals, as long as it is temporally valid in the presentation.
- the streaming server transmits the current SVG scene at regular intervals until it becomes temporally invalid.
- RTP real-time transport protocol
- the similarity information itself can either be used to dynamically update the SVG DOM on the client by using update syntax in the form of add, delete, or replace operations sent by the server.
- the streaming server can send information in the RTP packet with similarity or dissimilarity. Based upon the similarity between packets, the positions in the client's DOM can be dynamically modified. This optimizes the client's performance by minimizing the number of reconstructions of the DOM, and just repairing or modifying the DOM tree. It should be noted that if the SVG scene graphs being compared are very similar, it may be more optimal to send dissimilarity information rather than similarity information.
- the scene update instructions are as follows.
- SetAttribute This element is used to update attributes of the scene specified by the target element (xlink:href).
- the attributeName and attribute values correspond to attribute and value of the target element to be addedor replaced.
- AddElement This element is used to update the scene with a new element (mycircle) as a child of the specified parent element (xlink:href). If insertBefore is specified, the new element is inserted before this element.
- ReplaceElement This element is used to replace an existing element (xlink:href) from the scene with a new element (myCircle). This operation is essentially in-order combination of DeleteElement and AddElement.
- node positions relative to the root can be used for the above update operations if attribute names are not provided for the SVG elements. Examples for the various update operations using relative node positions are:
- This command updates the attributes of the fifth and sixth child nodes of the first child starting at the root level.
- the above command adds a new element before the sixth child node of the second child starting at the root level.
- the above command deletes the first child of the third node starting at the root level.
- the above command replaces the second child of the fourth child node of the third child node starting from the root level.
- FIGS. 9 ( a )- 9 ( e ) show SVG examples of various scene update operations.
- FIG. 9 ( a ) shows an original SVG scene including a rectangle element“myRect.”
- FIG. 9 ( b ) shows the effect of using the “SetAttribute”, which changes the position of the original element, “myrect.”
- FIG. 9 ( c ) shows the effect of the “AddElement” attribute, adding a new item, “myCircle,” to the scene.
- FIG. 9 ( d ) shows the effect of the “DeleteElement” attribute, deleting the “myrect” element from the scene.
- FIG. 9 ( e ) shows the effect of the “ReplaceElement” attribute, in which a new item, “myCircle” replaces the previously existing item, “myEllipse.”.
- SVG Content Searching. Unlike traditional raster graphics, SVG has an underlying XML syntax, thus making search tasks relatively more straightforward. With the growing applicability of SVG for generating weather maps, traffic information and entertainment, the corpus of SVG documents is only increasing. Faster and more efficient retrieval techniques are therefore imperative. Content searching by example and content searching by keywords are two frameworks that can greatly benefit from the SVG similarity computation system of the present invention. Furthermore, with the ensuing popularity of SVG as a web-based vector graphics language, search engines can be extended to include searching for SVG documents as well.
- SVG documents can be queried on a few search engines such as Google.
- search engines such as Google.
- searches are mainly based off of the name of the SVG file and the context surrounding it. Harnessing the similarity information and being able to prioritize parts of the SVG document with some importance attributes can result in a more intelligent search mechanism.
- FIG. 10 An example of assigning such information is shown in FIG. 10 .
- importance weights are normalized and range from 0 to 1, with 1 being most important.
- Documents can therefore be searched, where relevance is based on searching the most important parts of the SVG document.
- the internal query for searching documents can be of the form: Specifying the node ->‘/’Node Position ‘/’ . . . where ‘/’ means ‘child of’.
- SVG documents have an XML-like structure and contain repeated and redundant information. By observing intra-document similarity, more optimization can be made while encoding SVG by searching for similarities in the DOM structure and eliminating redundant information.
- SVG is often verbose.
- SVG documents are therefore an ideal candidate for compression.
- a new set of compression algorithms may be developed for SVG.
- the step of determining a minimal logical representation of two SVG documents can include the application of other SVG rules in order to achieve a minimal logical representation based upon the application or system being used.
- SVG tree isomorphism may be applied on different varieties of SVG trees, such as normalized trees or the complete versions.
- variations in the notations of the parameters used in computing isomorphism may be used.
- FIG. 12 shows a system 10 in which the present invention can be utilized, comprising multiple communication devices that can communicate through a network.
- the system 10 may comprise any combination of wired or wireless networks including, but not limited to, a mobile telephone network, a wireless Local Area Network (LAN), a Bluetooth personal area network, an Ethernet LAN, a token ring LAN, a wide area network, the Internet, etc.
- the system 10 may include both wired and wireless communication devices.
- the system 10 shown in FIG. 12 includes a mobile telephone network 11 and the Internet 28 .
- Connectivity to the Internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and the like.
- the exemplary communication devices of the system 10 may include, but are not limited to, a mobile telephone 12 , a combination PDA and mobile telephone 14 , a PDA 16 , an integrated messaging device (IMD) 18 , a desktop computer 20 , and a notebook computer 22 .
- the communication devices may be stationary or mobile as when carried by an individual who is moving.
- the communication devices may also be located in a mode of transportation including, but not limited to, an automobile, a truck, a taxi, a bus, a boat, an airplane, a bicycle, a motorcycle, etc.
- Some or all of the communication devices may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24 .
- the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the Internet 28 .
- the system 10 may include additional communication devices and communication devices of different types.
- the communication devices may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc.
- CDMA Code Division Multiple Access
- GSM Global System for Mobile Communications
- UMTS Universal Mobile Telecommunications System
- TDMA Time Division Multiple Access
- FDMA Frequency Division Multiple Access
- TCP/IP Transmission Control Protocol/Internet Protocol
- SMS Short Messaging Service
- MMS Multimedia Messaging Service
- e-mail e-mail
- Bluetooth IEEE 802.11, etc.
- a communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
- FIGS. 13 and 14 show one representative mobile telephone 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of mobile telephone 12 or other electronic device.
- the mobile telephone 12 of FIGS. 13 and 14 includes a housing 30 , a display 32 in the form of a liquid crystal display, a keypad 34 , a microphone 36 , an ear-piece 38 , a battery 40 , an infrared port 42 , an antenna 44 , a smart card 46 in the form of a UICC according to one embodiment of the invention, a card reader 48 , radio interface circuitry 52 , codec circuitry 54 , a controller 56 and a memory 58 .
- Individual circuits and elements are all of a type well known in the art, for example in the Nokia range of mobile telephones.
- the present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments.
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
Abstract
Description
- The present invention relates generally systems for Scalable Vector Graphics (SVG) documents. More particularly, the present invention relates to systems and methods for measuring the degree of similarity in SVG documents.
- Measuring a level or degree of similarity between objects can be very useful in many applications. Image similarity is performed based upon the images' inherent color and texture properties. 3D objects can be compared based on shape matching algorithms that consider topology and feature matching. Textual content can be matched by metrics ranging from a simple “diff” program to more advanced pattern matching and semantic grouping algorithms.
- Although many systems and mechanisms are known for determining and measuring levels of similarity among XML documents, computing similarity among SVG documents is not nearly so simple. SVG content is based upon an underlying XML format. The fundamental difficulty with SVG documents that two given SVG documents with a very similar underlying XML representation may have completely different visual representations when rendered, and vice-versa. FIGS. 1(a) and 1(b) exemplify this issue. Although the two SVG documents in FIGS. 1(a) and 1(b) look identical, their underlying textual representations are quite different from one another. In particular,
FIG. 1 (a) makes use of <defs> and <use> elements for predefining the shapes and reusing them with different colors and positions.FIG. 1 (b), on the other hand, renders each shape separately without reusability. If a system relies on traditional document comparison methods to determine the similarity between these documents, the documents might be classified as being vastly different. In addition, traditional pixel-based methods for determining levels of similarity are not optimal, as one would have to convert the SVG-rendered content into raster graphics, and the process becomes even more complicated when animations are involved. - FIGS. 2(a) and 2(b), on the other hand, show a situation where the SVG textual contents are similar to one another, but the documents themselves possess very different visual appearances. The only difference between the two documents in terms of the underlying SVG text is “style=“visibility:hidden.” However, this small difference makes the ultimate images look quite different visually.
- Although SVG is considered a promising XML-based language for 2D graphics, potentially opening up a whole host of possibilities for new consumer and enterprise services, there has thus far been relatively little progress in optimizing SVG in these different applications.
- Several methods and tools have been previously developed for computing similarity among XML documents. For example, one tool called “XML Diff” detects structural changes in the XML sub-trees and produces a Diffgram to describe the differences between the two sub-trees. A second method involves the use of a matching algorithm for measuring the structural similarity between an XML document and a DTD. A third approach involves linearizing the structure of each XML document by representing it as a numerical sequence, and then comparing the sequences through the analysis of their frequencies. A fourth approach involves a structural similarity metric for XML documents based upon an “XML aware” edit distance between ordered labeled trees to cluster documents by DTD. A fifth method measures similarity between vectors after representing documents based upon their structure in vector form. This method is used to obtain the measure of structural similarity between two given documents and is discussed in United States Application Publication No. 2005/0038785. However, none of these systems focus on the problem of differences in the underlying content and the visual representation, as XML by itself is not visual and SVG is a special form of XML content.
- In addition to the above, there are also several methods for compressing XML content based upon certain optimizations for removing redundant patterns. One such system involves a new XML compression scheme that is based upon the Sequitur compression algorithm to remove excessive information redundancy in its representation. By organizing the compression result as a set of context free grammar rules, the scheme supports processing of XPath queries without decompression. Another approach involves a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, referred to as XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data. XMill uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, as well as user-defined compressors for application specific data.
- The present invention provides an improved process of determining whether any two given SVG documents are visually similar, as well as determining the degree of similarity and the nature of similarity between the documents. The present invention introduces a mechanism for converting individual SVG documents into their minimal logical representations based upon certain SVG optimization heuristics. The similarity of these logical representations is then computed on the representations' reduced logical DOM trees. Tree isomorphism is useful to determine how similar tree structures are with respect to one another. Several metrics, such as the maximum common sub-tree and tree distance measurement, can be used to determine the maximum number of common nodes between two given trees. The larger the maximum common sub-tree is, the more similar are the two trees.
- The approach of the present invention for reducing SVG to its minimal logical form is different from the XML compression methods discussed above in that the minimal representation is computed in a substantially different manner with the present invention. Additionally, unlike the compression based methods discussed previously, the minimal logical SVG form is still uncompressed and is capable of still being rendered using the system and method of the present invention.
- The present invention is capable of vastly improving and even optimizing SVG's performance, so that the underlying logical representations that comprise large datasets of SVG content can be modified, updated or traversed more efficiently based upon the computed similarity information. The present invention is also unique in that it concerns SVG-type content where both the underlying XML representation and its visual appearance need to be considered before actually computing a degree of similarity. In addition, present invention provides new techniques for computing a minimal logical representation of the SVG document, computing the similarity by applying tree isomorphism on the normalized SVG trees and representing this similarity information.
- The system and method of the present invention can be applied to various use cases particularly relevant to the mobile devices community. In a streaming scenario, this similarity information allows the client to be more discrete about whether or not to use the arriving packet of data for reconstructing a scene graph. Furthermore, the client may be able to repair its existing scene graph based on information provided in the RTP payload header rather than destroying the entire scene graph and reconstructing a new one. In a content search scenario, similar SVG documents can be retrieved based upon either an example SVG document or by a query of keywords. This facilitates more efficient and intelligent content search applications. In a compression scenario, by observing intra-document similarity, redundant information within a given SVG document can be removed by merely storing the element ID instead.
- These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.
- FIGS. 1(a) and 1(b) show examples of two SVG documents that appear graphically similar, but the underlying textual representations are vastly different;
-
FIG. 2 (a) and 2(b) show examples of two SVG documents that appear graphically different, but the underlying textual representations are similar; -
FIG. 3 shows the conversion of the underlying textual representation of a SVG document into a minimal logical representation by ignoring redundant, unreferenced and unused SVG elements; -
FIG. 4 demonstrates an example of resolving attributes while computing a minimal logical representation from a SVG document; -
FIG. 5 shows an example of ignoring less important information while computing a minimal logical representation; -
FIG. 6 shows an example of two SVG documents sharing some similar content between them; -
FIG. 7 shows an example of two SVG documents sharing some similar structure between them; -
FIG. 8 shows an example of two SVG documents sharing similar content as well as structure between them; -
FIG. 9 (a) presents a graphical depiction of an original SVG scene;FIG. 9 (b) shows the effect of a Scene Update: SetpdateAttribute action;FIG. 9 (c) shows the effect of a subsequent Scene Update: AddElement action;FIG. 9 (d) shows the effect of a subsequent Scene update: DeleteElement action; andFIG. 9 (e) shows the effect of a subsequent Scene update: ReplaceElement action; -
FIG. 10 is representation showing the assigning of importance weights to individual pieces of content in a SVG document; -
FIG. 11 is a flow chart showing a process for implementing one embodiment of the present invention; -
FIG. 12 is an overview diagram of a system within which the present invention may be implemented; -
FIG. 13 is a perspective view of an electronic device that can incorporate the principles of the present invention; and -
FIG. 14 is a schematic representation of the circuitry of the electronic device ofFIG. 13 . - The present invention provides a system and method for reducing SVG documents to their minimal logical DOM tree structure in order to avoid the problem of disparity between the visual and underlying textual representations of each document. The present invention then applies similarity metrics based on tree isomorphism to compute the degree of similarity between the SVG documents. The present invention can be used in applications such as streaming, content searching, and compression. Each of these use cases is discussed below.
- A process for computing similarity between the SVG samples according to the present invention is depicted in
FIG. 11 . Atstep 100, a minimal logical representation of two SVG documents is determined. Atstep 110, tree isomorphism is performed on each SVG's Document Object Model (DOM). Atstep 120, the system applies similarity metrics to the created hierarchical trees for each document. Atstep 130, a representation of the computed similarity information for the two documents is created. This information can then be used in by one of several applications atstep 140. Each of these process portions is discussed below. - The determining of a minimal logical representation of a SVG document involves four different sub-processes in one embodiment of the invention. The first sub-process involves collapsing the document. Redundant, unreferenced and unused SVG elements and comments are ignored. This sub-process is applied in order to remove elements that do not contribute to the visual representation of the SVG document. The document may often contain element definitions within the <defs> block that are not rendered or may be set to invisible. Such elements can be removed before comparing the document at issue to other SVG documents. This collapsing sub-process is depicted in
FIG. 3 , where comments and an instruction with a “visibility:hidden” setting are removed. - A second sub-process involves “substitution.” This sub-process applies to expanding internal references and definitions within the SVG document, resolving references and resolving relative URLs to absolute URLs for the sake of fairer comparison. As seen in
FIG. 1 , <defs> and <use> elements are meant for the reusability of element definitions and can be used as many times as one desires in the SVG document. If there is a <use> element, the used content gets cloned and virtually appended to the <use> element. This cloned content is a “shadow tree” of the original element that gets “<use> ed.” Substitution involves expanding the element definition inline where a <use> element is mentioned. This can be done in order to accurately compare the document with another visually similar SVG document that does not use <defs> and <use> elements, for example. - The third sub-process in creating a minimal logical representation involves resolving attributes. Often, even if references are properly substituted from the <defs> block and the same element id is used with ‘xlink:href’, the attributes in the documents might be different. This can lead to significant variations in the visual representation. For example, in SVG animations, changes in height, width, begin time, etc. may lead to completely different visuals at a particular timestamp. One such example is depicted in
FIG. 4 . In this scenario, even though both documents use the same animate definition, the attributes in the actual animation are different, as indicated in the corresponding visual representations at a two second mark. As the first animation starts one second earlier than the second animation, the rectangle has already changed color from purple to yellow by the two-second mark. - The fourth sub-process in creating a minimal logical representation involves ignoring less significant differences. Individual elements or attributes in SVG often may not be as important as other elements or attributes. For example, differences in font styles and size may be ignored when computing similarity for use in certain applications. Additionally, SVG embeds other media such as raster images, audio and video. These elements would require audio/video similarity metrics to be applied separately to them in addition to computing SVG similarity, or they can simply be ignored depending upon the particular application or system. In the example shown in
FIG. 5 , the two SVG documents are visually similar, except that the word ‘Hello’ is larger in the first image when compared to the second image. Looking at the textual representations, one can see that the font sizes vary. Depending upon the application, this may not be very important and, when creating the minimal logical representation, the ‘font-size’ attribute can be ignored if the application so permits. - Step 110 in
FIG. 11 involves Tree Isomorphism on each SVG's Document Object Model (DOM). In SVG, the elements are rendered in the order in which they appear in the scene graph or Document Object Model (DOM). Each element in the data format can be thought of as a canvas on which paint is applied. If objects are grouped together with a <g> tag, they are first rendered as a separate group canvas, then composited on the main canvas using filters or alpha masks associated with the group. The underlying SVG definitions are laid out in an XML-like node-labeled tree structure. In other words, the DOM can be viewed as a directed acyclic tree, allowing traversal of the nodes of the tree that constitute the various SVG elements. - Two given trees are generally considered to be isomorphic if one tree can be transformed into the other by simply renaming nodes. A SVG document tree T can be defined as a 4 tuple T=(N, r, E, L), where N is a finite set of nodes; r=root(T) belonging to N is the root node ‘<svg>’; E is an edge and therefore a binary relation on N2; and L defines a string representing an element name or attribute as defined in SVG 1.2, which can be found at http://www.w3.org/TR/SVG12/.
- In order for two SVG trees T1 and T2 to be isomorphic, there needs to be a 1-1 mapping f: V1->V2 such that {v,w} is an edge in E1 if and only if {f(v), f(w)} is an edge in E2. Also, TI and T2 represent the minimal logical representations of the two SVG documents.
- Step 120 in
FIG. 11 involves applying similarity metrics. One can apply different similarity metrics that are used for computing hierarchical tree similarity. The present invention can use any of the following metrics while still providing superior performance. - (1) If A and B are two logically minimized SVG DOM trees, the set operation ∩ (intersection) can be defined. This operation denotes a SVG sub-tree common to both trees A and B. The cardinality (|T|) of a SVG DOM tree is the number of nodes (SVG elements) in a tree T. A percentage measure of similarity can be given by:
S1=|A∩B|/|A|*100
S2=|A∩B|/|B|*100 - (2) Tree Distance Measurements: In this example, A and B are two logically minimized SVG DOM trees, D(A, B) is their tree edit distance (i.e., the number of nodes that are different) and Cost(A,B) is the cost to delete the different nodes in A and insert the new nodes from B. The tree distance measure S between A and B is then defined as S(A, B)=D(A,B)/Cost(A,B). The S(A,B) is low when the SVG documents are similar with a high percentage of matching nodes and high with low percentage of matching nodes when they are very different. In this scenario, the value ranges between 0 and 1.
- The similarity between any two minimal logical SVG scene graphs can be characterized based on three categories: Content similarity, structural similarity, and positional content similarity.
- Content similarity involves determining the common content intersection between two logical SVG scene graphs. The equality of contents in two given SVG scene graphs is analogous to sub-string matching. For example, the two SVG scene graphs illustrated in
FIG. 6 share some common road sign symbol definitions. A fragment of the common SVG content shared between the two SVG documents is provided below the scene graphs. - Structural similarity involves determining the common structural intersection between two logical SVG scene graphs. Structurally similar SVG scene graphs have the similar tree hierarchies, i.e. they share some common elements and children of elements. However, the attributes of the common elements might be different. In
FIG. 7 , the documents have some structural similarity, but the icons in each of the documents have different attributes and IDs. For example, the first SVG document has icon elements define weather symbols, and the second SVG document has the icon elements define road signs. Additionally, the fill color attributes for various elements in the two files are different. However, the two documents do have a similar underlying map structure with road connectors and cities. - Positional content similarity involves the consideration of content as well as position. Positional content similarity is a measure of the common content positioned in the common structure and is most suitable for determining SVG packet similarity while streaming, as it provides the most information. The higher the percentage of positional content similarity between two SVG scene graphs, the more identical the SVG scene graphs are to each other. In
FIG. 8 , both SVG documents/scene graphs are positionally similar with the map layout in consideration. However, SVG Document B has a zooming capability on one region of the map. - Step 130 in
FIG. 11 involves the creation of a representation of the computed similarity information. While the similarity metrics inform the user the type of similarity and how similar the SVG scene graphs are to one another, the similarity information tells the user what is similar. For example, in a streaming scenario, if the positional content similarity between two SVG samples is >80%, then the streaming server can choose to transmit the similarity information to the client. The client can then modify the scene graph based upon this incoming similarity information. - SVG is XML-based, and the similarity information between two SVG scene graphs can be intuitively represented using node positions in the SVG scene graph. The syntax grammar that can be used to specify node positions for SVG similarity information, is defined as follows:
- Specifying the node ->‘/’Node Position ‘/’ . . . where ‘/’ means “child of”. Range of nodes ->Node Position '-'Node Position
- Using an example, /1/2/3-6->means the third, fourth, fifth and sixth child of the second child of the first node starting at the root <svg> level.
- Step 140 in
FIG. 11 involves having various applications use the generated similarity information. Three such applications are discussed herein. However, it is also possible for other applications to be capable of using the generated similarity information of the present invention. - Streaming. SVG content has the capability of providing a framework for audio and video content and can be streamed across a network to many clients at a given time. As streaming itself does not promise residency in data delivery, one of several error correction mechanisms includes retransmission of the same sample at specific time intervals, as long as it is temporally valid in the presentation. In order to ensure that a client receives the SVG scene relevant at a particular time instant, the streaming server transmits the current SVG scene at regular intervals until it becomes temporally invalid.
- A problem arises when a client rendering a current SVG scene receives the same packet, destroys its existing-scene graph, and reconstructs it again with the new packet. This leads to severe optimization problems on the client side when it repeatedly destroys and reconstructs the SVG scene at regular intervals. There has been no conventional solution for determining similarity between SVG samples while streaming, and no information has been provided in the real-time transport protocol (RTP) payload format to do so. By providing an effective method for comparing two adjacent samples of the SVG media content while streaming, this information can be specified as a new unit in the RTP payload header. The client, upon receiving a sample, can therefore simply read this information from the header and choose to destroy its existing scene graph if the new sample is different.
- In a streaming scenario, the similarity information itself can either be used to dynamically update the SVG DOM on the client by using update syntax in the form of add, delete, or replace operations sent by the server. The streaming server can send information in the RTP packet with similarity or dissimilarity. Based upon the similarity between packets, the positions in the client's DOM can be dynamically modified. This optimizes the client's performance by minimizing the number of reconstructions of the DOM, and just repairing or modifying the DOM tree. It should be noted that if the SVG scene graphs being compared are very similar, it may be more optimal to send dissimilarity information rather than similarity information. The scene update instructions are as follows.
- SetAttribute: This element is used to update attributes of the scene specified by the target element (xlink:href). The attributeName and attribute values correspond to attribute and value of the target element to be addedor replaced. <SetAttribute attributeName=“x” attributeValue=“10”
- attributeType=“CSS/XML/auto” xlink:href=“#myRect”/>
- AddElement: This element is used to update the scene with a new element (mycircle) as a child of the specified parent element (xlink:href). If insertBefore is specified, the new element is inserted before this element.
- <AddElement xlink:href=“#Scene1” insertBefore=“#myRect”>
- <circle id=myCircle” cx=“20” cy=“20” r=“50” fill=“yellow”/>
- </AddElement>
- DeleteElement: This element is used to delete the specified element (href) from the scene or DOM object model.
- </DeleteElement xlink:href=“#myRect”/>
- This update is ignored if the element specified in the syntax does not exist. Also, if the element under contention happens to have children, the entire sub-tree is removed from the client's memory. However, only the element is deleted if it is a leaf node.
- ReplaceElement: This element is used to replace an existing element (xlink:href) from the scene with a new element (myCircle). This operation is essentially in-order combination of DeleteElement and AddElement.
- <ReplaceElement xlink:href=“#myCircle”>
- <ellipse id=“myellipse” cx=“40” cy=“35” rx=“110” ry=“60” fill=“blue”/>
- </ReplaceElement>
- In addition, node positions relative to the root can be used for the above update operations if attribute names are not provided for the SVG elements. Examples for the various update operations using relative node positions are:
- <SetAttribute attributeName=“x” attributeValue=“10” attributeType=“CSS/XML/auto” position=“/1/5-6”/>
- This command updates the attributes of the fifth and sixth child nodes of the first child starting at the root level.
- <AddElement xlink:href=“#Scene1” insertBefore=“/2/6”>
- <circle id=myCircle″ cx=“20” cy=“20” r=“50” fill=“yellow”/>
- </AddElement>
- The above command adds a new element before the sixth child node of the second child starting at the root level.
- <DeleteElement position=“/3/1”/>
- The above command deletes the first child of the third node starting at the root level.
- <ReplaceElement xlink:href=/3/4/2Δ>
- <ellipse id=“myEllipse” cx=
- “40” cy=“35” rx=“110” ry=“60” fill=“blue”/>
- The above command replaces the second child of the fourth child node of the third child node starting from the root level.
- FIGS. 9(a)-9(e) show SVG examples of various scene update operations.
FIG. 9 (a) shows an original SVG scene including a rectangle element“myRect.”FIG. 9 (b) shows the effect of using the “SetAttribute”, which changes the position of the original element, “myrect.”FIG. 9 (c) shows the effect of the “AddElement” attribute, adding a new item, “myCircle,” to the scene.FIG. 9 (d) shows the effect of the “DeleteElement” attribute, deleting the “myrect” element from the scene.FIG. 9 (e) shows the effect of the “ReplaceElement” attribute, in which a new item, “myCircle” replaces the previously existing item, “myEllipse.”. - Content Searching. Unlike traditional raster graphics, SVG has an underlying XML syntax, thus making search tasks relatively more straightforward. With the growing applicability of SVG for generating weather maps, traffic information and entertainment, the corpus of SVG documents is only increasing. Faster and more efficient retrieval techniques are therefore imperative. Content searching by example and content searching by keywords are two frameworks that can greatly benefit from the SVG similarity computation system of the present invention. Furthermore, with the ensuing popularity of SVG as a web-based vector graphics language, search engines can be extended to include searching for SVG documents as well.
- SVG documents can be queried on a few search engines such as Google. However, such searches are mainly based off of the name of the SVG file and the context surrounding it. Harnessing the similarity information and being able to prioritize parts of the SVG document with some importance attributes can result in a more intelligent search mechanism.
- An example of assigning such information is shown in
FIG. 10 . In this case, importance weights are normalized and range from 0 to 1, with 1 being most important. Documents can therefore be searched, where relevance is based on searching the most important parts of the SVG document. The internal query for searching documents can be of the form: Specifying the node ->‘/’Node Position ‘/’ . . . where ‘/’ means ‘child of’. - Compression. Considerable effort has been made to make SVG file sizes as small as possible while still retaining the benefits of XML and achieving compatibility and leverage with other World Wide Web Consortium (W3C) specifications. SVG documents have an XML-like structure and contain repeated and redundant information. By observing intra-document similarity, more optimization can be made while encoding SVG by searching for similarities in the DOM structure and eliminating redundant information.
- Like any textual markup language, SVG is often verbose. There is a great deal of “redundant” data in a SVG document, including white space, comments and element and attribute names as pointed out previously. SVG documents are therefore an ideal candidate for compression. By extending the normalization of the SVG documents into their minimal logical representation, a new set of compression algorithms may be developed for SVG.
- The following are several examples of alternative implementations of the present invention. In one such implementation, the step of determining a minimal logical representation of two SVG documents can include the application of other SVG rules in order to achieve a minimal logical representation based upon the application or system being used.
- In another implementation, the concept of SVG tree isomorphism may be applied on different varieties of SVG trees, such as normalized trees or the complete versions. In an additional embodiment, variations in the notations of the parameters used in computing isomorphism may be used.
- In another embodiment of the present invention, other commonly used similarity metrics can be applied on hierarchical trees. Along the same lines there may be other variations in similarity categories at the step of applying similarity metrics to the created hierarchical trees for each document.
- In yet another embodiment of the invention, there may be variations in syntax as to how individual elements are accessed in the DOM tree. The scene update syntax may also include variations for the streaming scenario depending upon the client and server used and the application itself. Lastly, modifications can be made to how the SVG minimal logical representation, tree isomorphism and similarity metrics are computed in search and compression scenarios. All of the above embodiments and implementations can also be combined as necessary or desired to meet system and application requirements.
-
FIG. 12 shows asystem 10 in which the present invention can be utilized, comprising multiple communication devices that can communicate through a network. Thesystem 10 may comprise any combination of wired or wireless networks including, but not limited to, a mobile telephone network, a wireless Local Area Network (LAN), a Bluetooth personal area network, an Ethernet LAN, a token ring LAN, a wide area network, the Internet, etc. Thesystem 10 may include both wired and wireless communication devices. - For exemplification, the
system 10 shown inFIG. 12 includes amobile telephone network 11 and theInternet 28. Connectivity to theInternet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and the like. - The exemplary communication devices of the
system 10 may include, but are not limited to, amobile telephone 12, a combination PDA andmobile telephone 14, aPDA 16, an integrated messaging device (IMD) 18, adesktop computer 20, and anotebook computer 22. The communication devices may be stationary or mobile as when carried by an individual who is moving. The communication devices may also be located in a mode of transportation including, but not limited to, an automobile, a truck, a taxi, a bus, a boat, an airplane, a bicycle, a motorcycle, etc. Some or all of the communication devices may send and receive calls and messages and communicate with service providers through awireless connection 25 to abase station 24. Thebase station 24 may be connected to anetwork server 26 that allows communication between themobile telephone network 11 and theInternet 28. Thesystem 10 may include additional communication devices and communication devices of different types. - The communication devices may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc. A communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
-
FIGS. 13 and 14 show one representativemobile telephone 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type ofmobile telephone 12 or other electronic device. Themobile telephone 12 ofFIGS. 13 and 14 includes ahousing 30, adisplay 32 in the form of a liquid crystal display, akeypad 34, amicrophone 36, an ear-piece 38, abattery 40, aninfrared port 42, anantenna 44, asmart card 46 in the form of a UICC according to one embodiment of the invention, acard reader 48,radio interface circuitry 52, codec circuitry 54, acontroller 56 and amemory 58. Individual circuits and elements are all of a type well known in the art, for example in the Nokia range of mobile telephones. - The present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments.
- Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
- Software and web implementations of the present invention could be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps. It should also be noted that the words “component” and “module” as used herein, and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.
- The foregoing description of embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the present invention. The embodiments were chosen and described in order to explain the principles of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated.
Claims (22)
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/245,859 US7403951B2 (en) | 2005-10-07 | 2005-10-07 | System and method for measuring SVG document similarity |
PCT/IB2006/002779 WO2007042891A2 (en) | 2005-10-07 | 2006-10-06 | System and method for measuring svg document similarity |
KR1020087010901A KR101040094B1 (en) | 2005-10-07 | 2006-10-06 | System and method for measuring SVG document similarity |
CN2006800434767A CN101589384B (en) | 2005-10-07 | 2006-10-06 | System and method for measuring svg document similarity |
EP06808960.6A EP1932085A4 (en) | 2005-10-07 | 2006-10-06 | System and method for measuring svg document similarity |
JP2008534104A JP2009512006A (en) | 2005-10-07 | 2006-10-06 | System and method for measuring similarity of SVG documents |
TW095137312A TW200805089A (en) | 2005-10-07 | 2006-10-11 | System and method for measuring SVG document similarity |
HK10100703.6A HK1133711A1 (en) | 2005-10-07 | 2010-01-22 | System and method for measuring svg document similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/245,859 US7403951B2 (en) | 2005-10-07 | 2005-10-07 | System and method for measuring SVG document similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070083808A1 true US20070083808A1 (en) | 2007-04-12 |
US7403951B2 US7403951B2 (en) | 2008-07-22 |
Family
ID=37912206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/245,859 Expired - Fee Related US7403951B2 (en) | 2005-10-07 | 2005-10-07 | System and method for measuring SVG document similarity |
Country Status (8)
Country | Link |
---|---|
US (1) | US7403951B2 (en) |
EP (1) | EP1932085A4 (en) |
JP (1) | JP2009512006A (en) |
KR (1) | KR101040094B1 (en) |
CN (1) | CN101589384B (en) |
HK (1) | HK1133711A1 (en) |
TW (1) | TW200805089A (en) |
WO (1) | WO2007042891A2 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080016087A1 (en) * | 2006-07-11 | 2008-01-17 | One Microsoft Way | Interactively crawling data records on web pages |
US20080086679A1 (en) * | 2006-10-05 | 2008-04-10 | Educational Testing Service | Data structure for defining a chart |
US20100312755A1 (en) * | 2006-10-07 | 2010-12-09 | Eric Hildebrandt | Method and apparatus for compressing and decompressing digital data by electronic means using a context grammar |
US20110225289A1 (en) * | 2010-03-12 | 2011-09-15 | Fujitsu Limited | Determining Differences in an Event-Driven Application Accessed in Different Client-Tier Environments |
US8095543B1 (en) | 2008-07-31 | 2012-01-10 | The United States Of America As Represented By The Secretary Of The Navy | Fast algorithms and metrics for comparing hierarchical clustering information trees and numerical vectors |
WO2012148692A1 (en) * | 2011-04-28 | 2012-11-01 | Qualcomm Incorporated | Memoizing web - browsing computation with dom-based isomorphism |
US20130091150A1 (en) * | 2010-06-30 | 2013-04-11 | Jian-Ming Jin | Determiining similarity between elements of an electronic document |
US20130091414A1 (en) * | 2011-10-11 | 2013-04-11 | Omer BARKOL | Mining Web Applications |
US20130151562A1 (en) * | 2010-07-08 | 2013-06-13 | Hitachi, Ltd. | Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence |
US8773470B2 (en) | 2010-05-07 | 2014-07-08 | Apple Inc. | Systems and methods for displaying visual information on a device |
US8832108B1 (en) * | 2012-03-28 | 2014-09-09 | Emc Corporation | Method and system for classifying documents that have different scales |
US8832065B2 (en) | 2010-10-29 | 2014-09-09 | Fujitsu Limited | Technique for coordinating the distributed, parallel crawling of interactive client-server applications |
US8843494B1 (en) | 2012-03-28 | 2014-09-23 | Emc Corporation | Method and system for using keywords to merge document clusters |
US8880588B2 (en) | 2010-10-29 | 2014-11-04 | Fujitsu Limited | Technique for stateless distributed parallel crawling of interactive client-server applications |
US8880951B2 (en) | 2012-04-06 | 2014-11-04 | Fujitsu Limited | Detection of dead widgets in software applications |
US9069768B1 (en) | 2012-03-28 | 2015-06-30 | Emc Corporation | Method and system for creating subgroups of documents using optical character recognition data |
US9208054B2 (en) | 2011-02-14 | 2015-12-08 | Fujitsu Limited | Web service for automated cross-browser compatibility checking of web applications |
US20160093056A1 (en) * | 2014-09-29 | 2016-03-31 | Digitalglobe, Inc. | Multi-spectral image labeling with radiometric attribute vectors of image space representation components |
US9396540B1 (en) | 2012-03-28 | 2016-07-19 | Emc Corporation | Method and system for identifying anchors for fields using optical character recognition data |
US9400962B2 (en) | 2010-10-29 | 2016-07-26 | Fujitsu Limited | Architecture for distributed, parallel crawling of interactive client-server applications |
CN110874526A (en) * | 2018-12-29 | 2020-03-10 | 北京安天网络安全技术有限公司 | File similarity detection method and device, electronic equipment and storage medium |
CN113885307A (en) * | 2021-10-12 | 2022-01-04 | 广东安朴电力技术有限公司 | SVG parallel machine redundancy control method, SVG control method and SVG control system |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102007032812A1 (en) * | 2007-07-13 | 2009-01-22 | Siemens Ag | Method and device for creating a complexity vector for at least part of an SVG scene, and method and checking device for checking a playability of at least part of an SVG scene on a device |
CN101777048B (en) * | 2009-01-14 | 2013-04-03 | 国际商业机器公司 | Method and device for solving UI style conflict in web application composition |
US9311425B2 (en) * | 2009-03-31 | 2016-04-12 | Qualcomm Incorporated | Rendering a page using a previously stored DOM associated with a different page |
JP5478936B2 (en) * | 2009-05-13 | 2014-04-23 | キヤノン株式会社 | Information processing apparatus and information processing method |
US8667015B2 (en) * | 2009-11-25 | 2014-03-04 | Hewlett-Packard Development Company, L.P. | Data extraction method, computer program product and system |
US9449024B2 (en) | 2010-11-19 | 2016-09-20 | Microsoft Technology Licensing, Llc | File kinship for multimedia data tracking |
JP5701830B2 (en) * | 2012-09-04 | 2015-04-15 | 日本電信電話株式会社 | Document structure analysis apparatus and program |
US10635744B2 (en) * | 2016-04-21 | 2020-04-28 | Arivis Ag | File format agnostic document viewing, link creation and validation in a multi-domain document hierarchy |
KR101886069B1 (en) * | 2016-12-13 | 2018-08-07 | 유비스톰 주식회사 | Electronic document providing apparatus of division transferable of web page |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020073080A1 (en) * | 2000-01-14 | 2002-06-13 | Lipkin Daniel S. | Method and apparatus for an information server |
US20030050931A1 (en) * | 2001-08-28 | 2003-03-13 | Gregory Harman | System, method and computer program product for page rendering utilizing transcoding |
US20040004619A1 (en) * | 2002-06-19 | 2004-01-08 | Nokia Corporation | Method and apparatus for extending structured content to support streaming |
US20050038785A1 (en) * | 2003-07-29 | 2005-02-17 | Neeraj Agrawal | Determining structural similarity in semi-structured documents |
US20050182778A1 (en) * | 2002-07-15 | 2005-08-18 | Jorg Heuer | Method and devices for encoding/decoding structured documents, particularly xml documents |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20020049164A (en) * | 2000-12-19 | 2002-06-26 | 오길록 | The System and Method for Auto - Document - classification by Learning Category using Genetic algorithm and Term cluster |
KR20020058639A (en) * | 2000-12-30 | 2002-07-12 | 오길록 | A XML Document Retrieval System and Method of it |
US7055092B2 (en) * | 2001-12-05 | 2006-05-30 | Canon Kabushiki Kaisha | Directory for multi-page SVG document |
US7437664B2 (en) * | 2002-06-18 | 2008-10-14 | Microsoft Corporation | Comparing hierarchically-structured documents |
JP2004287923A (en) * | 2003-03-24 | 2004-10-14 | Seiko Epson Corp | Image processing decision device, image processing decision method and image processing decision program |
US7877399B2 (en) * | 2003-08-15 | 2011-01-25 | International Business Machines Corporation | Method, system, and computer program product for comparing two computer files |
FR2860315A1 (en) * | 2003-09-30 | 2005-04-01 | Canon Kk | METHOD AND DEVICE FOR EDITING DIGITAL GRAPHICS OF THE SVG TYPE, IN PARTICULAR FROM A BUILDER |
JP2005115628A (en) * | 2003-10-07 | 2005-04-28 | Hewlett-Packard Development Co Lp | Document classification apparatus using stereotyped expression, method, program |
KR100862587B1 (en) | 2007-03-28 | 2008-10-09 | 인하대학교 산학협력단 | Apparatus for measuring XML document similarity and method therefor |
-
2005
- 2005-10-07 US US11/245,859 patent/US7403951B2/en not_active Expired - Fee Related
-
2006
- 2006-10-06 WO PCT/IB2006/002779 patent/WO2007042891A2/en active Application Filing
- 2006-10-06 EP EP06808960.6A patent/EP1932085A4/en not_active Withdrawn
- 2006-10-06 CN CN2006800434767A patent/CN101589384B/en not_active Expired - Fee Related
- 2006-10-06 JP JP2008534104A patent/JP2009512006A/en active Pending
- 2006-10-06 KR KR1020087010901A patent/KR101040094B1/en not_active IP Right Cessation
- 2006-10-11 TW TW095137312A patent/TW200805089A/en unknown
-
2010
- 2010-01-22 HK HK10100703.6A patent/HK1133711A1/en not_active IP Right Cessation
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020073080A1 (en) * | 2000-01-14 | 2002-06-13 | Lipkin Daniel S. | Method and apparatus for an information server |
US20030050931A1 (en) * | 2001-08-28 | 2003-03-13 | Gregory Harman | System, method and computer program product for page rendering utilizing transcoding |
US20040004619A1 (en) * | 2002-06-19 | 2004-01-08 | Nokia Corporation | Method and apparatus for extending structured content to support streaming |
US20050182778A1 (en) * | 2002-07-15 | 2005-08-18 | Jorg Heuer | Method and devices for encoding/decoding structured documents, particularly xml documents |
US20050038785A1 (en) * | 2003-07-29 | 2005-02-17 | Neeraj Agrawal | Determining structural similarity in semi-structured documents |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080016087A1 (en) * | 2006-07-11 | 2008-01-17 | One Microsoft Way | Interactively crawling data records on web pages |
US7555480B2 (en) * | 2006-07-11 | 2009-06-30 | Microsoft Corporation | Comparatively crawling web page data records relative to a template |
US20080086679A1 (en) * | 2006-10-05 | 2008-04-10 | Educational Testing Service | Data structure for defining a chart |
US20100312755A1 (en) * | 2006-10-07 | 2010-12-09 | Eric Hildebrandt | Method and apparatus for compressing and decompressing digital data by electronic means using a context grammar |
US8095543B1 (en) | 2008-07-31 | 2012-01-10 | The United States Of America As Represented By The Secretary Of The Navy | Fast algorithms and metrics for comparing hierarchical clustering information trees and numerical vectors |
US20110225289A1 (en) * | 2010-03-12 | 2011-09-15 | Fujitsu Limited | Determining Differences in an Event-Driven Application Accessed in Different Client-Tier Environments |
US9032067B2 (en) * | 2010-03-12 | 2015-05-12 | Fujitsu Limited | Determining differences in an event-driven application accessed in different client-tier environments |
US8773470B2 (en) | 2010-05-07 | 2014-07-08 | Apple Inc. | Systems and methods for displaying visual information on a device |
US20130091150A1 (en) * | 2010-06-30 | 2013-04-11 | Jian-Ming Jin | Determiining similarity between elements of an electronic document |
US20130151562A1 (en) * | 2010-07-08 | 2013-06-13 | Hitachi, Ltd. | Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence |
US8880588B2 (en) | 2010-10-29 | 2014-11-04 | Fujitsu Limited | Technique for stateless distributed parallel crawling of interactive client-server applications |
US9400962B2 (en) | 2010-10-29 | 2016-07-26 | Fujitsu Limited | Architecture for distributed, parallel crawling of interactive client-server applications |
US8832065B2 (en) | 2010-10-29 | 2014-09-09 | Fujitsu Limited | Technique for coordinating the distributed, parallel crawling of interactive client-server applications |
US9208054B2 (en) | 2011-02-14 | 2015-12-08 | Fujitsu Limited | Web service for automated cross-browser compatibility checking of web applications |
WO2012148692A1 (en) * | 2011-04-28 | 2012-11-01 | Qualcomm Incorporated | Memoizing web - browsing computation with dom-based isomorphism |
US8886679B2 (en) * | 2011-10-11 | 2014-11-11 | Hewlett-Packard Development Company, L.P. | Mining web applications |
US20130091414A1 (en) * | 2011-10-11 | 2013-04-11 | Omer BARKOL | Mining Web Applications |
US8843494B1 (en) | 2012-03-28 | 2014-09-23 | Emc Corporation | Method and system for using keywords to merge document clusters |
US9069768B1 (en) | 2012-03-28 | 2015-06-30 | Emc Corporation | Method and system for creating subgroups of documents using optical character recognition data |
US8832108B1 (en) * | 2012-03-28 | 2014-09-09 | Emc Corporation | Method and system for classifying documents that have different scales |
US9396540B1 (en) | 2012-03-28 | 2016-07-19 | Emc Corporation | Method and system for identifying anchors for fields using optical character recognition data |
US8880951B2 (en) | 2012-04-06 | 2014-11-04 | Fujitsu Limited | Detection of dead widgets in software applications |
US20160093056A1 (en) * | 2014-09-29 | 2016-03-31 | Digitalglobe, Inc. | Multi-spectral image labeling with radiometric attribute vectors of image space representation components |
US9619711B2 (en) * | 2014-09-29 | 2017-04-11 | Digitalglobe, Inc. | Multi-spectral image labeling with radiometric attribute vectors of image space representation components |
CN110874526A (en) * | 2018-12-29 | 2020-03-10 | 北京安天网络安全技术有限公司 | File similarity detection method and device, electronic equipment and storage medium |
CN113885307A (en) * | 2021-10-12 | 2022-01-04 | 广东安朴电力技术有限公司 | SVG parallel machine redundancy control method, SVG control method and SVG control system |
Also Published As
Publication number | Publication date |
---|---|
WO2007042891A2 (en) | 2007-04-19 |
TW200805089A (en) | 2008-01-16 |
HK1133711A1 (en) | 2010-04-01 |
CN101589384B (en) | 2011-06-29 |
KR20080069988A (en) | 2008-07-29 |
CN101589384A (en) | 2009-11-25 |
JP2009512006A (en) | 2009-03-19 |
US7403951B2 (en) | 2008-07-22 |
EP1932085A1 (en) | 2008-06-18 |
EP1932085A4 (en) | 2013-08-21 |
KR101040094B1 (en) | 2011-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7403951B2 (en) | System and method for measuring SVG document similarity | |
US8255792B2 (en) | Techniques for binding scalable vector graphics to associated information | |
US7254285B1 (en) | Image description system and method | |
US8370869B2 (en) | Video description system and method | |
EP1125245B1 (en) | Image description system and method | |
Melnik et al. | A layered approach to information modeling and interoperability on the web | |
US6941511B1 (en) | High-performance extensible document transformation | |
US8051371B2 (en) | Document analysis system and document adaptation system | |
Adzic et al. | A survey of multimedia content adaptation for mobile devices | |
US8335779B2 (en) | Method and apparatus for gathering, categorizing and parameterizing data | |
US20030018668A1 (en) | Enhanced transcoding of structured documents through use of annotation techniques | |
US20080005147A1 (en) | Method, apparatus and computer program product for making semantic annotations for easy file organization and search | |
Zhu et al. | Distributed skyline retrieval with low bandwidth consumption | |
CA2686292A1 (en) | Method and system for automatically generating web page transcoding instructions | |
Lindholm | XML three-way merge as a reconciliation engine for mobile data | |
Benitez et al. | Object-based multimedia content description schemes and applications for MPEG-7 | |
US7333994B2 (en) | System and method for database having relational node structure | |
KR100543597B1 (en) | Multimedia searching and browsing method | |
Metso et al. | A content model for the mobile adaptation of multimedia information | |
De Virgilio et al. | Rule-based adaptation of web information systems | |
Hinze et al. | The TIP/Greenstone bridge: A service for mobile location-based access to digital libraries | |
Chmielewski | Finding interactive 3D objects by their interaction properties | |
Sjekavica et al. | Advantages of semantic web technologies usage in the multimedia annotation and retrieval | |
Lay et al. | SOLO: an MPEG-7 optimum search tool | |
Vittal | An object-oriented multimedia database system for a news-on-demand application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NOKIA CORPORATION, FINLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SETLUR, VIDYA;INGRASSIA, MICHAEL;CHITTURI, SURESH;AND OTHERS;REEL/FRAME:017415/0774 Effective date: 20051031 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: NOKIA TECHNOLOGIES OY, FINLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NOKIA CORPORATION;REEL/FRAME:035564/0861 Effective date: 20150116 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: BP FUNDING TRUST, SERIES SPL-VI, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:WSOU INVESTMENTS, LLC;REEL/FRAME:049235/0068 Effective date: 20190516 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: WSOU INVESTMENTS LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NOKIA TECHNOLOGIES OY;REEL/FRAME:052694/0303 Effective date: 20170822 |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20200722 |
|
AS | Assignment |
Owner name: OT WSOU TERRIER HOLDINGS, LLC, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:WSOU INVESTMENTS, LLC;REEL/FRAME:056990/0081 Effective date: 20210528 |
|
AS | Assignment |
Owner name: WSOU INVESTMENTS, LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:TERRIER SSC, LLC;REEL/FRAME:056526/0093 Effective date: 20210528 |