WO2004010325A1 - Method and system for transforming semantically related documents - Google Patents

Method and system for transforming semantically related documents Download PDF

Info

Publication number
WO2004010325A1
WO2004010325A1 PCT/US2003/021624 US0321624W WO2004010325A1 WO 2004010325 A1 WO2004010325 A1 WO 2004010325A1 US 0321624 W US0321624 W US 0321624W WO 2004010325 A1 WO2004010325 A1 WO 2004010325A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
elements
construct
target
source
Prior art date
Application number
PCT/US2003/021624
Other languages
French (fr)
Inventor
Walter H. Lindsay
Original Assignee
Contivo, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Contivo, Inc. filed Critical Contivo, Inc.
Priority to AU2003247971A priority Critical patent/AU2003247971A1/en
Publication of WO2004010325A1 publication Critical patent/WO2004010325A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/157Transformation using dictionaries or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/16Automatic learning of transformation rules, e.g. from examples

Definitions

  • This invention relates to the processing of electronic documents.
  • it relates to the integration of electronic documents from diverse sources.
  • One approach to automating document integration is to represent source and target documents purely in terms of their organizational structure (hereinafter referred to as meta-data) and to define mappings between semantically equivalent structures or constructs in the meta-data for the source and target documents.
  • meta-data organizational structure
  • semantic equivalencies between source and target documents are identified and are associated through a common vocabulary of concepts.
  • a mapping between constructs in the source and target documents is automatically generated by finding all constructs in the source and target document that are associated with the same concept in the common vocabulary of concepts.
  • the construct 102 iterates within document 100. Assume that the source document 100 is to be mapped to a target document 108 which has a "Department” construct 110 and an "Item” construct 114 which, taken together, represent the semantic equivalent of "ProductLineltem” construct 102 in source document 100. The semantic equivalence between the constructs is indicated by the dotted lines in Figure 1. The solid arrows indicate that "Department” field 104 is to be mapped to "DepartmentName” field 112 and "Item” field 106 is to be mapped to an "ItemName” field 116.
  • mappings to transform data in the "ProductLineltem” construct 102 into equivalent data in the target document 108 will have to be intelligent enough to realize that when the "Department" field 104 changes from the previous iteration of "ProductLineltem” 102 then a new instance of "Department” 110, "DepartmentName” 112, "ItemName” 116, and “Item” 114 in the target document 108 should be created, otherwise, only a new instance of "Item” 114, and "Item Name” 116 should be created.
  • a mapping should also specify how the mapping is to take place in order to ensure a meaningful transformation of data from source document to target document.
  • a method for transforming a source document in a source format to a target document in a target format comprising deriving a semantic model for iterative constructs in the source and target document; defining processing constraints for each iterative construct in the source document that a transformation for transforming the construct data the equivalent construct in the target document must satisfy; and automatically generating each transformation based on the semantic model and the processing constraints.
  • Figure 1 shows an example of a source document which includes an iterative construct which has to be mapped to an equivalent construct in a target document
  • Figure 2 shows a flow chart of operations performed in transforming a source document in a source format to a target document in a target format, in accordance with one embodiment of the invention
  • Figure 3 shows a flow chart of operations performed in executing a block shown in Figure 2 of the drawings, in greater detail;
  • Figure 4 shows the meta-data and model for a source document
  • Figure 5 shows the meta-data and model for a target document
  • Figure 6 shows the meta-data for a document in graphical form
  • Figure 7 shows a semantic model constructed for the documents of
  • Figure 8 shows a flow chart of operations performed in transforming a source document in a source format to a target document in a target format in accordance with another embodiment of the invention.
  • Figure 9 shows a block diagram of a system which may be in accordance with one embodiment of the invention.
  • aspects of the present invention relate to a method and system for transforming a source document in a source format to a target document in a target format.
  • XML Extended Mark-up Language
  • the syntax of the Extended Mark-up Language (XML) will be used. This is because the XML syntax provides a convenient syntax to describe the structure of documents.
  • use of the XML syntax in this description is not intended to limit the present invention in any way
  • FIG. 2 of the drawings a flow chart of operations performed in transforming a source document in a source format to a target document in a target format in accordance with one embodiment of the invention is shown.
  • the operations shown in Figure 2 may be performed by a system such as is shown in Figure 9 of the drawings.
  • a semantic model for iterative constructs in the source and target document is derived. Aspects of implementing block 200 is shown in greater detail in Figure 3 of the drawings.
  • a meta-data representation of the source and target document is constructed. As previously described, the meta-data describes the organizational structure and characteristics of the constructs within a document. It is convenient to represent meta-data graphically.
  • Figure 4 of the drawings shows meta-data 400 in graphical form for a source document.
  • Meta-data 400 includes information on the structure of components contained therein in column 402 and information on the number of occurrences of each component in column 404.
  • meta-data 400 indicates that there is an element a1 which is a parent of an element a2 and an element a5.
  • the elements a2 and a5 are sibling elements since they are immediate children of the element a1.
  • the element a2 has in turn a child element a3 which in turn has a child element a4.
  • the element a2 occurs a minimum and a maximum of once within the source document.
  • elements a4 and a6 occur only once per parent element within the source document.
  • FIG. 5 shows an example of meta-data 500 for a target document to which the source document showed in Figure 4 of drawings is to be mapped.
  • Meta-data 500 contains information about a structure 502 and occurrences 504 of each component within the target document.
  • the semantic concepts to which structure 502 relates is shown in column 506.
  • a unique identifier is assigned to each occurrence of the element "a".
  • the second occurrence of the element "a” may be distinguished from the first occurrence by uniquely identifying the element e1 under which the element "a” occurs by its context within the document.
  • the element e1 has the following four contexts: p1le1; pile 1*2; p1/p2; p1lp2le1;
  • the semantic model is stored in a database, for example, within a server.
  • Figure 7 of the drawings shows an example of the semantic model "700" that has been constructed for the meta-data of Figures 5 and 6. Referring to the model 700 it will be seen that constructs a3 and b5 are semantically equivalent whereas constructs a5 and b3 are semantically equivalent. Further, it will be noted that the model contains a context for each semantic construct which basically defines a unique path in the meta-data to the semantic concept.
  • the goal of the processing constraints is to specify how a mapping should be performed in order to provide a meaningful transformation.
  • the processing constraints provide a "fill order" for translating between electronic documents. Referring again to Figure 2 of the drawings at block 204 each transformation based on the semantic model and the processing constraints is automatically generated.
  • FIG. 8 of the drawings a flow chart of operations performed in transforming a source document in a source format to a target document in a target format in accordance with another embodiment of the invention is shown.
  • the operations shown in Figure 8 may be performed by a system such as is shown in Figure 9 of the drawings.
  • a first set of semantically related elements in the source document wherein at least one of the elements iterates is identified.
  • a second set of semantically related elements in the target document wherein at least one of the elements iterates, and wherein further, the first and second sets of elements are semantically equivalent is identified.
  • a semantic model is derived which specifies which elements in the first and second sets of elements are semantically equivalent.
  • processing rules which a transformation to transform data from the first set to data in the second set must satisfy is defined.
  • duplicate elements in the first and second set of fields are uniquely identified by assigning a unique name to each duplicate element.
  • duplicate element names are uniquely identified based on a hierarchy of each duplicate element within a set. Examples of the processing rules which a transformation must satisfy are provided below.
  • reference numeral 900 generally indicates an example of a system which may be used to implement to perform embodiments of the invention described above.
  • the system 900 includes a memory 904, which may represent one or more physical memory devices, which may include any type of random access memory (RAM), read only memory (ROM) which may be programmable, flash memory, non-volatile mass storage device, or a combination of such memory devices.
  • RAM random access memory
  • ROM read only memory
  • the memory 904 is connected via a system bus 912 to a processor 902.
  • the memory 904 includes instructions 906 which when executed by the processor 902 cause the processor to perform the methodology of the invention as discussed above.
  • the system 900 includes a disk drive 908 and a CD ROM drive 910 each of which is coupled to a peripheral-device and user-interface 914 via bus 912.
  • Processor 902, memory 904, disk drive 908 and CD ROM 910 are generally known in the art.
  • Peripheral-device and user-interface 914 provides an interface between system bus 912 and various optional components connected to a peripheral bus 916 as well as to user interface components, such as a display, mouse and other user interface devices.
  • a network interface 918 is coupled to peripheral bus 916 and provides network connectivity to system 900.
  • Numerous examples of processing constraints or "fill orders" are now provided. In the examples 1 to 9 described below, various documents produced by a video rental store are shown. In each of the examples, the document appearing on the left is the source document and the document appearing on the right is the target document to which the source document has to be mapped.
  • Example 1 Video Store Reporting Sales of a Movie to a Remote Office
  • a video store reports sales of a single movie to a remote office.
  • a computer in the video store generates the source document on the left which, must be transformed into the target document on the right:
  • the source document has the same information as the target document.
  • the ⁇ Total> element in the target document can be computed from the ⁇ NumberSold> multiplied by the ⁇ P ⁇ ce> in the source document. It will be seen that the structure of the two documents is not the same. Thus, a "transformation" that defines how to convert from the source document to the target document is necessary.
  • Example 2 Video Store Reporting Sales of Several Movies
  • Example 3 Video Store Reporting Sales of Several Movies at Several
  • the pricing structure of a movie may vary, with some customers being offered a discount or sale price.
  • the source document appears on the left and the target document to which it must be transformed appears on the right.
  • Example 4 Generating a List of Stars from a Movie Catalog
  • the source document contains a video catalog update organized by video title.
  • the target document on the right contains a catalog organized by the actors who star in a movie.
  • Example 5 Reconstructing the Movie Catalog from the Actors List
  • the target and source documents of Example 5 are reversed.
  • the target document of Example is to be transformed into the source document of Example 5.
  • Example 6 Converting the Actors List into Three Separate Lists
  • a list of actors in the source document must be broken up into three separate viz., a list of actors, a list of movies, and a list of years in the target document.
  • Example 8 Concatenating Two Lists into One
  • the source document contains a catalog update where the videos are separated according to new and old videos.
  • the target document the old and new video lists are concatenated into a single list.
  • the ⁇ Sales> element in the first document contains a ⁇ Movie> element which contains a ⁇ Title> and a ⁇ Sold> element.
  • the ⁇ Sold> element can contain the ⁇ NumberSold> and ⁇ Price> elments.
  • the meta-data conveniently summarizes this structural information. The meta-data indicates that some of the elements contain a text value.
  • the second and third examples above had iterative elements.
  • the meta-data for the documents of Example 2 is not shown, as these documents are a simpler case of the documents of Example 3, which is shown.
  • the meta-data for the documents of Example 3 is shown on the left.
  • the '*' symbol indicates that an element can repeat under its parent:
  • each ⁇ Sales> element can have multiple ⁇ Movie> elements; each ⁇ Movie> element can have multiple ⁇ Sold> elements; each ⁇ TransactionSummary> element can have multiple ⁇ Video> elements; and each ⁇ Video> can have multiple ⁇ Sales> elements.
  • the ⁇ Sales>, ⁇ Movie>, ⁇ TransactionSummary>, and ⁇ Video> elements are iterative elements.
  • a model relates document constructs and fill order information with semantic concepts.
  • the model examples in this document use paths through the metadata to identify document constructs, although other approaches are possible. If a model has two constructs associated with the same semantic concept, the constructs can be mapped. Special annotations in the paths in the model indicate how to relate instances of a construct in a real document based on the meta-data to the semantic concept. These annotations are part of the semantic model. From the annotations in the models for the source and target documents, the "fill order" needed for transforming the source to the target document is clear. In the examples in the following section, a "*" character in a model entry indicates that each instance of the corresponding construct in a real document is an instance of the semantic concept.
  • the model specifies a "for each" fill order.
  • a "for each" fill order means that a transform is to copy every instance of the construct in the source to the target.
  • Other fill order information in the model is shown in square brackets in a model entry.
  • transforms are target-centric, so model entries are sorted from the shortest to the longest target-side entries, and within that, shortest to longest source-side entries. "Fill Order" information in the model is applied left to right from the model entries.
  • Example 3 Only Example 3 is discussed, as Example 2 is really a simplification of Example 3.
  • the meta-data is:
  • a semantic model for the source and target documents of this example would specify that the construct or element Sales/Movie is to be mapped to the construct TransactionSummary ⁇ lideo and that the construct Sales/Movie/Sold is to be mapped to the construct TransactionSummary ⁇ ideolSales.
  • the processing constraints or fill order that a transformation in transforming between the source and target documents of this example must satisfy include:
  • the model above relates ⁇ SaleslMovie> and ⁇ TransactionSummary ⁇ lideo> with the same semantic concept. Both of the model entries end in a "*", so the fill order is: for each ⁇ Movie> create a ⁇ Video>.
  • the model similarly relates ⁇ Sales/Movie/Sold> and ⁇ TransactionSummary ⁇ lideolSales>. Because ⁇ SaleslMovie> and ⁇ TransactionSummary ⁇ lideo> have already been mapped and a fill order specified, the "*" character after ⁇ Movie> and ⁇ Video> can be ignored at this point.
  • a semantic model for the source and target documents of this example would specify that the construct or element Catalog UpdateAlideo is to be mapped to the construct StarListlStar and Catalog Update ⁇ lideolActor ⁇ s also to be mapped to the construct StarListlStar.
  • the fill order or processing constraint that a transformation for transforming between the source and target documents of this example must satisfy includes:
  • the model for the meta-data and fill order is:
  • the fill order for this mapping can be described with a single source and a single target model entry, even though the fill order is complex.
  • the model entries are traversed from left to right to identify mappings and fill order/processing constraints, ⁇ CatalogUpdate ⁇ lideo> in the source and ⁇ StarListlStar> in the target each have a "*", so they are mapped with a "for each" fill order.
  • Target:GroupBy:Title,Year]" applies only to the target, so is ignored here.
  • a semantic model for the source and target documents of this example would specify that the construct or element StarListlStar is to be mapped to the constructs CatalogUpdate ⁇ lideo Catalog UpdateNideol Actor.
  • the fill order or processing constraints that a transformation for transforming between the source and target documents of this example must satisfy include: For every ⁇ Star>, create a new ⁇ Video> record if the ⁇ Title> or ⁇ Year> holds a different value than the in the previous ⁇ Star>, and create a new ⁇ Actor> record.
  • the model for the meta-data and fill order is:
  • the model specifies that the element ⁇ StarListlStar> be mapped with two elements viz, ⁇ CatalogUpdatel ⁇ lideo> and ⁇ CatalogUpdate ⁇ lideolActor>.
  • the model also specifies the special fill order instruction "
  • a transformation must group all the ⁇ CatalogUpdate ⁇ lideo> records that have identical values for the ⁇ Title> and ⁇ Year> elements.
  • a new ⁇ Video> is created only when the group by condition of an element is different than the group by condition of the previous element.
  • a new ⁇ Actor> element i always created.
  • the model satisfies the fill order constraints needed for this transform.
  • Example 6 The meta-data is:
  • a semantic model for the source and target documents of this example would specify that the construct or element StarListlStar is to be mapped to the construct ThreeListslActor, the construct StarListlStar is to be mapped to the construct ThreeLists/Title and that the construct StarListlStar is to be mapped to the construct ThreeListslYear.
  • the fill order is: Make three passes over the source document, such that in pass one: for each ⁇ Star>, generate an ⁇ Actor>; in pass two: For each ⁇ Star>, generate a ⁇ Title>; and in pass three: For each ⁇ Star>, generate a ⁇ Year>.
  • the model for the meta-data and fill order is:
  • the model specifies three semantic matches between the source and the target. Iterating construct ⁇ StarListlStar> maps to ⁇ ThreeListslActor> with a "for each" fill order. ⁇ StarListlStar> maps to ⁇ ThreeLists/Title> with a "for each" fill order. ⁇ StarListlStar> maps to ⁇ ThreeListslYear> with a "for each" fill order. If the transfor is processed in a target-centric fashion, top to bottom left to right in terms of the targ document meta-data and not the source document meta-data, then the transform cannot generate all three target iterating constructs simultaneously, since the three ⁇ independent of each other. Thus, the first must be processed, then the second, thei the third. Thus the model produces the mapping and fill order needed to transform 1 source to the target. [0055] Example 7 The meta-data is:
  • a semantic model for the source and target documents of this example would specify that the construct or element ThreeListslActor is to be mapped to the construct StarListlStar, the construct ThreeLists/Title is to be mapped to the constru StarListlStar and that the construct ThreeListslYear ' is to be mapped to the construe StarListlStar.
  • the fill order is: For the each unique "triple" of ⁇ Actor>, ⁇ Title> and ⁇ Year>, create a new ⁇ Star>.
  • the model for the meta-data and fill order is:
  • the model specifies mappings from ⁇ ThreeListslActor>, ⁇ ThreeListslTitle> and ⁇ ThreeListslYear> in the source document to ⁇ StarListlStar> in the target documenl each with a "for each" fill order.
  • the three source constructs are independent of eacl other, and the transformation is target-centric, thus all three maps are processed simultaneously as three lists in lock step with each other, producing the fill order of the transformation.
  • a semantic model for the source and target documents of this example would specify that the constructs or elements AgeCatalogUpdatelOldVideo and AgeCatalogUpdatelNewVideo is to be mapped to the construct CatalogUpdateAlideo and the constructs AgeCatalogUpdatelOldVideolActor and AgeCatalogUpdatelNewVideolActor s to be mapped to the construct Catalog UpdateAlideolActor.
  • the fill order is:
  • the model for the meta-data and fill order is:
  • the model entries for the video semantic concepts are applied before the model entries for the actor semantic concepts because the target-side construct ⁇ CatalogUpdatel ⁇ lideo> is higher up in the meta-data than ⁇ CatalogUpdate ⁇ lideolActor>.
  • the target-side construct ⁇ CatalogUpdatel ⁇ lideo> is higher up in the meta-data than ⁇ CatalogUpdate ⁇ lideolActor>.
  • either of the source- side constructs can be matched first because neither source-side construct is under the other.
  • the independent source-side constructs ⁇ AgeCatalogUpdatel OldVideo> and ⁇ AgeCatalogUpdatelNewVideo> are both mapped to ⁇ CatalogUpdatel ⁇ lideo> in the target.
  • the fill order for both is “for each” with the "Source:Excl” (exclusive) modifier, so that each is processed separately, one after the other, unlike the parallel-list "for each" processing of example 7.
  • the model specifies that the source elements ⁇ AgeCatalogUpdatelOldVideolActor> and ⁇ AgeCatalogUpdatelNewVideolActor> are both mapped to the target element ⁇ CatalogUpdateAlideolActor>.
  • a semantic model for the source and target documents of this example w ⁇ specify that the construct or element CatalogUpdate ⁇ lideo is to be mapped to the constructs AgeCatalogUpdatelOldVideo and AgeCatalogUpdatelNewVideo, and th ⁇ the construct CatalogUpdate ⁇ lideolActor is to be mapped to the constructs AgeCatalogUpdatelOldVideolActor and AgeCatalogUpdatelNewVideo I Actor
  • the fill order is:
  • ⁇ CatalogUpdateAlideo> is mapped to ⁇ AgeCatalogUpdatelOldVideo> and ⁇ CatalogUpdatel ⁇ lideo> is mapped to ⁇ AgeCatalogUpdatelNewVideo>.
  • Embodiments of the present invention allow conditional mappings i.e. mappings in which a target element is created if and only if the condition is met.
  • an ⁇ OldVideo> construct will be created only if the value in its ⁇ Year> field is less than or equal to 1970.
  • the mapping from ⁇ CatalogUpdatel ⁇ lideo> to ⁇ AgeCatalogUpdatelNewVideo> similarly has a "for each" with a condition that the value in the ⁇ Year> field is greater than 1970.
  • ⁇ CatalogUpdateAlideolActor> is mapped to both ⁇ AgeCatalogUpdatelOldVideolActor> and
  • the source document comprises a list of videos, where a video can have multiple actors.
  • the source document is to be converted into a target document comprising a list of actors wherein each actor could have starred in multiple videos.
  • the meta-data is:
  • a semantic model for the source and target documents for this example would specify that the construct or element CatalogUpdateAlideolActor is to be mapped to the construct GroupedStarListlActor and that the construct CatalogUpdate ⁇ lideo is to be mapped to the construct GroupedStarListlActor /Movie.
  • the fill order is: For every ⁇ CatalogUpdate ⁇ lideolActor> in the source, first create a ⁇ GroupedStarListlActor> element. Then create a single ⁇ Movie> element. Populate the ⁇ ActorName>, ⁇ Movie/Title> and ⁇ MovielYear> elements with data from the source. Reorganize the target document, to group all the ⁇ Movie> elements under the correct ⁇ Actor>.
  • the model for the meta-data and fill order is:
  • the actor semantic concept is applied first, because its target model entry is the parent of the video target model entry.
  • ⁇ CatalogUpdateA/ideo> and ⁇ CatalogUpdate ⁇ lideolActor> in the source document both map to ⁇ GroupedStarListlActor> in the target with a "for each" fill order that groups the target groups by the values in ⁇ ActorName>.
  • ⁇ GroupedStarListlActor> in the target is already mapped, but ⁇ GroupedStarListlActor/Movie> is not.
  • ⁇ CatalogUpdateAlideo> has already been mapped.
  • ⁇ CatalogUpdate ⁇ lideo> is an ancestor of ⁇ CatalogUpdate/VideolActor> and remapping ⁇ CatalogUpdate ⁇ lideo> to ⁇ GroupedStarListlActorlMovie> satisfies the video model entries and the actor model entries, it is remapped, with a "for each" fill order.
  • the processing is done two-stages: First, the data is moved from the source document to the target document, and thereafter it is organized in terms of the group-by clause. [0067] It will be appreciated that modeling the meta-data of a document once allows the document to be mapped to multiple independent target or multiple independent source documents. By simple extension of the techniques described above, additional documents can be modeled and mapped.

Abstract

The invention provides a method and system for transforming a source document in a source format to a target document in a target format. According to one embodiment, the method comprises deriving a semantic model for iterative constructs in the source and target documents including defining processing constraints for each iterative construct in the source and target documents that a transformation for transforming a construct in the source document to its equivalent construct in the target document must satisfy; and automatically generating each transformation based on the semantic model and the processing constrains.

Description

METHOD AND SYSTEM FOR TRANSFORMING SEMANTICALLY RELATED DOCUMENTS
FIELD OF THE INVENTION
[0001] This invention relates to the processing of electronic documents. In particular it relates to the integration of electronic documents from diverse sources. BACKGROUND
[0002] Modern business enterprises today face the challenge of integrating business documents from diverse document sources or systems. Such integration can be beneficial in achieving operational competitiveness by integrating internal systems to obtain a single view of an enterprise's data. Alternatively, such integration can be beneficial in achieving collaborative competitiveness by integrating an enterprise's systems with those of strategic trading partners.
[0003] For reasons of speed, accuracy, and convenience it is desirable to automate document integration as far as possible. One approach to automating document integration is to represent source and target documents purely in terms of their organizational structure (hereinafter referred to as meta-data) and to define mappings between semantically equivalent structures or constructs in the meta-data for the source and target documents. With this approach, semantic equivalencies between source and target documents are identified and are associated through a common vocabulary of concepts. Thereafter, a mapping between constructs in the source and target documents is automatically generated by finding all constructs in the source and target document that are associated with the same concept in the common vocabulary of concepts. While the notion of using semantic equivalencies between documents to automatically generate such mappings is easy to understand, it is difficult to implement in the case of documents which have iterative constructs; i.e., constructs which iterate within the document and which may themselves have constructs which iterate within the construct. As an example of the difficulty of automatically generating a mapping between constructs which iterate, consider the example shown in Figure 1 of the drawings. Referring to Figure 1 , a source document 100 is shown to have an iterative construct 102 entitled "ProductLineltem." The construct 102 is iterative because it can occur between one and an arbitrary number of times within source document 100. The fields of construct 102 include a "Department" field 104 and an "Item" field 106. The construct 102 iterates within document 100. Assume that the source document 100 is to be mapped to a target document 108 which has a "Department" construct 110 and an "Item" construct 114 which, taken together, represent the semantic equivalent of "ProductLineltem" construct 102 in source document 100. The semantic equivalence between the constructs is indicated by the dotted lines in Figure 1. The solid arrows indicate that "Department" field 104 is to be mapped to "DepartmentName" field 112 and "Item" field 106 is to be mapped to an "ItemName" field 116. Any mapping to transform data in the "ProductLineltem" construct 102 into equivalent data in the target document 108 will have to be intelligent enough to realize that when the "Department" field 104 changes from the previous iteration of "ProductLineltem" 102 then a new instance of "Department" 110, "DepartmentName" 112, "ItemName" 116, and "Item" 114 in the target document 108 should be created, otherwise, only a new instance of "Item" 114, and "Item Name" 116 should be created. Thus, it will be appreciated that besides determining which fields in a source and target document are equivalent and therefore should be mapped, a mapping should also specify how the mapping is to take place in order to ensure a meaningful transformation of data from source document to target document.
[0004] Because of the above problem of automatically generating mappings between iterative constructs, one approach has been to map the iterative constructs manually. Another approach has been to perform the mapping based on a heuristic and to allow user input to correct errors in the mapping. Neither approach is satisfactory and there is therefore a need to be able to map iterative constructs automatically. SUMMARY OF THE INVENTION
[0005] According to one aspect of the invention there is provided a method for transforming a source document in a source format to a target document in a target format, the method comprising deriving a semantic model for iterative constructs in the source and target document; defining processing constraints for each iterative construct in the source document that a transformation for transforming the construct data the equivalent construct in the target document must satisfy; and automatically generating each transformation based on the semantic model and the processing constraints.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] Figure 1 shows an example of a source document which includes an iterative construct which has to be mapped to an equivalent construct in a target document;
[0007] Figure 2 shows a flow chart of operations performed in transforming a source document in a source format to a target document in a target format, in accordance with one embodiment of the invention;
[0008] Figure 3 shows a flow chart of operations performed in executing a block shown in Figure 2 of the drawings, in greater detail;
[0009] Figure 4 shows the meta-data and model for a source document;
[0010] Figure 5 shows the meta-data and model for a target document;
[0011] Figure 6 shows the meta-data for a document in graphical form;
[0012] Figure 7 shows a semantic model constructed for the documents of
Figures 4 and 5;
[0013] Figure 8 shows a flow chart of operations performed in transforming a source document in a source format to a target document in a target format in accordance with another embodiment of the invention; and
[0014] Figure 9 shows a block diagram of a system which may be in accordance with one embodiment of the invention. DETAILED DESCRIPTION
[0015] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the invention.
[0016] Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
[0017] Aspects of the present invention relate to a method and system for transforming a source document in a source format to a target document in a target format. In order to facilitate discussion of the present invention the syntax of the Extended Mark-up Language (XML) will be used. This is because the XML syntax provides a convenient syntax to describe the structure of documents. However, use of the XML syntax in this description is not intended to limit the present invention in any way
[0018] According to XML syntax, if for example, an element called "Person" is to be defined, wherein the element "Person" contains two other elements viz. element "FirstName" and element "LastName", then the start of the person element is indicated by the tag "<Person>" and the end of the person element is indicated by the tag "<IPerson>". It will be noted that the start and end tags are identical, except that the end tag has a 7" at its beginning. In order to show nesting within an element, indentation is used and the actual data within a document is indicated by bold typeface. Thus, for example, the person having first name "Walter" and last name "Lindsay will be expressed in XML syntax as follows:
<Person>
<FirstName>Walter</FirstName> <LastName>ϋndsay<ILastName> <IPerson>
[0019] In order to create models of documents it is often convenient to express a document's structure and characteristics independent of the actual data within the document. Such a representation of a document's structure and characteristics without the actual data is referred to as the "meta-data". For example, consider a sales document which indicates the title, number, and price of movies sold. Such a document may be depicted in XML as follows: <Sales> <Movie>
<Title>My Fair Lady<ITitle> <Sold>
<NumberSold>10<INumberSold> <Price>9.99<IPrice> </Sold> <IMovie> </Sales>
[0020] The meta-data for this document which indicates the structure and characteristics thereof is shown below: Sales Movie Title text Sold NumberSold text Price text [0021] Referring to the meta-data it will be seen that it conveniently summarizes the structure and characteristics of the sales document. Where an element can repeat under its parent a "*" symbol indicates this. Thus, with real data, the "Movie" and "Sold" elements within the sales document may repeat and thus a more accurate form of meta-data for the sales element is shown below: Sales Movie*
Title text
Sold* NumberSold text Price text [0022] Referring to Figure 2 of the drawings, a flow chart of operations performed in transforming a source document in a source format to a target document in a target format in accordance with one embodiment of the invention is shown. The operations shown in Figure 2 may be performed by a system such as is shown in Figure 9 of the drawings. At block 200 a semantic model for iterative constructs in the source and target document is derived. Aspects of implementing block 200 is shown in greater detail in Figure 3 of the drawings. Referring to Figure 3, at block 302 a meta-data representation of the source and target document is constructed. As previously described, the meta-data describes the organizational structure and characteristics of the constructs within a document. It is convenient to represent meta-data graphically. As an example, Figure 4 of the drawings shows meta-data 400 in graphical form for a source document. Meta-data 400 includes information on the structure of components contained therein in column 402 and information on the number of occurrences of each component in column 404. Thus, meta-data 400 indicates that there is an element a1 which is a parent of an element a2 and an element a5. The elements a2 and a5 are sibling elements since they are immediate children of the element a1. The element a2 has in turn a child element a3 which in turn has a child element a4. The element a2 occurs a minimum and a maximum of once within the source document. Likewise, elements a4 and a6 occur only once per parent element within the source document. On the other hand, elements a3 and a5 iterate and can occur between a zero number of times and an infinite number of times. The iterative nature of the elements a3 and a5 are indicated by the "*" symbol. [0023] Figure 5 shows an example of meta-data 500 for a target document to which the source document showed in Figure 4 of drawings is to be mapped. Meta-data 500 contains information about a structure 502 and occurrences 504 of each component within the target document. The semantic concepts to which structure 502 relates is shown in column 506.
[0024] Referring again to Figure 3, at block 302 the iterative constructs and the meta-data are disambiguated. This step is necessary since it is possible that an electronic document may contain multiple instances of a single portion of meta-data. For example, consider the meta-data 600 for an exemplary document shown in Figure 6 of the drawings. In this document, e1 appears three times viz., twice under p1 and once under p2; a and b appear four times viz., once under p2, and once under each of the e1 elements. Thus, it will be seen that in order to refer to the element "a" we need to specify which element "a" is being referenced. Therefore, it is necessary to disambiguate the meaning of the element "a" in the example shown in Figure 6. One way of doing this is to define the position of the element "a" in relation to a fixed reference in the metadata. A convenient fixed reference is the element p1 which serves as a root of the hierarchy shown in Figure 6. Thus, using p1 as the root and the "/" as the separator the following distinct paths to the element "a" may be defined: pile 1 a p 1le 11a (i. e. the second e 1) p1lp2la; and p1lp2le1la
[0025] In order to disambiguate the element "a", according to one embodiment of the invention, a unique identifier is assigned to each occurrence of the element "a". For example, the second occurrence of the element "a" may be distinguished from the first occurrence by uniquely identifying the element e1 under which the element "a" occurs by its context within the document. Thus, in the example shown in Figure 6 of the drawings, the element e1 has the following four contexts: p1le1; pile 1*2; p1/p2; p1lp2le1;
[0026] The "Λ" operator has been used to distinguish the second occurrence of the element e1. It will be appreciated that other methods may be used to distinguish the context of a construct.
[0027] Referring again to Figure 3 of the drawings, at block 304 the semantic composition of the meta-data of the source and target document in terms of constructs and the context is described. This is illustrated in Figure 4 of the drawings where column 406 assigns the semantic concept associated with elements in the meta-data 400. It will be noted that the element a3 has been assigned semantic concept SC1 and element a5 has been assigned semantic concept SC2. Similarly in the target document shown in Figure 5 it will be seen that element b3 has been assigned semantic concept SC1 and element b5 has been assigned semantic concept SC2. Once elements in the meta-data have been assigned a semantic concept and a unique context then the semantic model for the meta-data has been defined.
[0028] Thereafter, at block 306 the semantic model is stored in a database, for example, within a server. Figure 7 of the drawings shows an example of the semantic model "700" that has been constructed for the meta-data of Figures 5 and 6. Referring to the model 700 it will be seen that constructs a3 and b5 are semantically equivalent whereas constructs a5 and b3 are semantically equivalent. Further, it will be noted that the model contains a context for each semantic construct which basically defines a unique path in the meta-data to the semantic concept. Once the semantic model has been defined then at block 202 of Figure 2 of the drawings, processing constraints for each construct in the source document that a transformation for transforming the construct to its equivalent construct in the target document must satisfy is defined. The goal of the processing constraints is to specify how a mapping should be performed in order to provide a meaningful transformation. Viewed in another way, the processing constraints provide a "fill order" for translating between electronic documents. Referring again to Figure 2 of the drawings at block 204 each transformation based on the semantic model and the processing constraints is automatically generated.
[0029] Referring now to Figure 8 of the drawings, a flow chart of operations performed in transforming a source document in a source format to a target document in a target format in accordance with another embodiment of the invention is shown. The operations shown in Figure 8 may be performed by a system such as is shown in Figure 9 of the drawings. At block 800 a first set of semantically related elements in the source document, wherein at least one of the elements iterates is identified. Thereafter, at block 802 a second set of semantically related elements in the target document, wherein at least one of the elements iterates, and wherein further, the first and second sets of elements are semantically equivalent is identified. At block 804 a semantic model is derived which specifies which elements in the first and second sets of elements are semantically equivalent. At block 804 processing rules which a transformation to transform data from the first set to data in the second set must satisfy is defined. In deriving the semantic model in block 804, duplicate elements in the first and second set of fields are uniquely identified by assigning a unique name to each duplicate element. In an alternative embodiment, duplicate element names are uniquely identified based on a hierarchy of each duplicate element within a set. Examples of the processing rules which a transformation must satisfy are provided below.
[0030] Referring now to Figure 9 of the drawings, reference numeral 900 generally indicates an example of a system which may be used to implement to perform embodiments of the invention described above. The system 900 includes a memory 904, which may represent one or more physical memory devices, which may include any type of random access memory (RAM), read only memory (ROM) which may be programmable, flash memory, non-volatile mass storage device, or a combination of such memory devices. The memory 904 is connected via a system bus 912 to a processor 902. The memory 904 includes instructions 906 which when executed by the processor 902 cause the processor to perform the methodology of the invention as discussed above. Additionally, the system 900 includes a disk drive 908 and a CD ROM drive 910 each of which is coupled to a peripheral-device and user-interface 914 via bus 912. Processor 902, memory 904, disk drive 908 and CD ROM 910 are generally known in the art. Peripheral-device and user-interface 914 provides an interface between system bus 912 and various optional components connected to a peripheral bus 916 as well as to user interface components, such as a display, mouse and other user interface devices. A network interface 918 is coupled to peripheral bus 916 and provides network connectivity to system 900. [0031] Numerous examples of processing constraints or "fill orders" are now provided. In the examples 1 to 9 described below, various documents produced by a video rental store are shown. In each of the examples, the document appearing on the left is the source document and the document appearing on the right is the target document to which the source document has to be mapped. [0032] Example 1 : Video Store Reporting Sales of a Movie to a Remote Office
In this example, a video store reports sales of a single movie to a remote office. A computer in the video store generates the source document on the left which, must be transformed into the target document on the right:
Figure imgf000012_0001
[0033] In this case, the source document has the same information as the target document. The <Total> element in the target document can be computed from the <NumberSold> multiplied by the <Pήce> in the source document. It will be seen that the structure of the two documents is not the same. Thus, a "transformation" that defines how to convert from the source document to the target document is necessary.
[0034] Example 2: Video Store Reporting Sales of Several Movies
Assume that the video store reports the sales of several movies in a single source document, which is shown on the left in the table below. The document on the right is the target document into which the source document must be transformed.
Figure imgf000013_0001
[0035] Example 3: Video Store Reporting Sales of Several Movies at Several
Prices
In this example, in the source document the pricing structure of a movie may vary, with some customers being offered a discount or sale price. As above, the source document appears on the left and the target document to which it must be transformed appears on the right.
Figure imgf000014_0001
[0036] Example 4: Generating a List of Stars from a Movie Catalog In this example, the source document contains a video catalog update organized by video title. The target document on the right contains a catalog organized by the actors who star in a movie.
Figure imgf000015_0001
<Actor>Brad Pitt</Actor> <Title>Fight Club</Title> <Year>1999</Year>
</Star>
<Star>
<Actor>Helena Bonham- Carter</Actor> <Title>Fight CIub</Title> <Year>1999</Year>
</Star>
<Star>
<Actor>Meat Loaf</Actor> <Title>Fight Club</Title> <Year>1999</Year>
</Star>
<Star>
<Actor>Jared Leto</Actor> <Title>Fight Club</Title> <Year>1999</Year>
</Star>
<Star>
<Actor>RusseII Crowe</Actor> <Title>Gladiator</Title> <Year>2000</Year>
</Star>
<Star>
<Actor>Joaquin Phoenix</Actor> <Title>Gladiator</Title> <Year>2000</Year>
</Star>
<Star>
<Actor>Richard Harris</Actor> <Title>Gladiator</Title> <Year>2000</Year>
</Star>
Figure imgf000017_0001
[0037] Example 5: Reconstructing the Movie Catalog from the Actors List In this example, the target and source documents of Example 5 are reversed. Thus, the target document of Example is to be transformed into the source document of Example 5.
[0038] Example 6: Converting the Actors List into Three Separate Lists In this example a list of actors in the source document must be broken up into three separate viz., a list of actors, a list of movies, and a list of years in the target document.
Figure imgf000017_0002
Figure imgf000018_0001
<Title>Gladiator</TitIe>
<Year>2000</Year> </Star> <Star>
<Actor>Joaquin Phoerιix</Actor>
<Title>Gladiator</Title>
<Year>2000</Year> </Star> <Star>
<Actor>Richard Harris</Actor>
<Title>Gladiator</Title>
<Year>2000</Year> </Star> <Star>
<Actor>Connie Nielsen</Actor>
<Title>Gladiator</Title>
<Year>2000</Year> </Star> <Star>
<Actor>Dijimon Hounsou</Actor>
<Title>Gladiator</Title>
<Year>2000</Year> </Star> </StarList>
[0039] Example 7: Converting the Three Separate Lists back into the Actors
List
In this example, the source and target documents of Example 7 are reversed. [0040] Example 8: Concatenating Two Lists into One In this example the source document contains a catalog update where the videos are separated according to new and old videos. In the target document the old and new video lists are concatenated into a single list.
Figure imgf000019_0001
Figure imgf000020_0001
Figure imgf000021_0001
[0041] Example 9: Splitting One List into Two
In this example, source and target documents of Example 9 are reversed.
[0042] Meta-data for Example 1
The tables below show the meta-data on the left and the actual documents from
Example 1 on the right.
Figure imgf000021_0002
Figure imgf000022_0001
Figure imgf000022_0002
[0043] It will be seen that the <Sales> element in the first document contains a <Movie> element which contains a <Title> and a <Sold> element. The <Sold> element can contain the <NumberSold> and <Price> elments. The meta-data conveniently summarizes this structural information. The meta-data indicates that some of the elements contain a text value. [0044] Meta-data for Examples 2 and 3:
The second and third examples above had iterative elements. The meta-data for the documents of Example 2 is not shown, as these documents are a simpler case of the documents of Example 3, which is shown. The meta-data for the documents of Example 3 is shown on the left. The '*' symbol indicates that an element can repeat under its parent:
Figure imgf000022_0003
<NumberSold>2</Number
Sold>
<Price>12.99</Price> </Sold> </Movie> <Movie>
<Title>Cats and Dogs</Title> <Sold>
<NumberSold>5</Number
Sold>
<Price>8.99</Price> </Sold> <Sold>
<NumberSold>2</Number
Sold>
<Price>10.99</Price> </Sold> <Sold>
<NumberSold>1 </Number
Sold>
<Price>12.99</Price> </Sold> </Movie> </Sales>
Figure imgf000023_0001
Figure imgf000024_0001
[0045] The meta-data shows that: each <Sales> element can have multiple <Movie> elements; each <Movie> element can have multiple <Sold> elements; each <TransactionSummary> element can have multiple <Video> elements; and each <Video> can have multiple <Sales> elements. Thus, the <Sales>, <Movie>, <TransactionSummary>, and <Video> elements are iterative elements. [0046] Meta-data for Examples 4 through 9
The meta-data for the documents of Examples 4 through 9 are shown on the left in the tables below and the actual documents are shown on the right.
Figure imgf000024_0002
Figure imgf000025_0001
Figure imgf000025_0002
Figure imgf000026_0001
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000028_0002
<Title>Fight Club</Title> <Title>Fight Club</Title> <Title>Fight Club</Title> <Title>Gladiator</Title> <Title>Gladiator</Title> <Title>Gladiator</Title> <Title>Gladiator</Title> <Title>Gladiator</Title> <Year>1964</Year> <Year>1964</Year> <Year>1964</Year> <Year>1964</Year> <Year>1964</Year> <Year>1999</Year> <Year>1999</Year> <Year>1999</Year> <Year>1999</Year> <Year>1999</Year> <Year>2000</Year> <Year>2000</Year> <Year>2000</Year> <Year>2000</Year> <Year>2000</Year> </ThreeLists>
Figure imgf000029_0001
Brett</Actor> </OldVideo> <OldVideo>
<Title>lt's a Wonderful
Life</Title>
<Year>1946</Year>
<Actor>James
Stewart</Actor>
<Actor>Donna Reed</Actor>
<Actor>Lionel
Barrymore</Actor>
<Actor>Thomas itchell</Actor>
<Actor>Henry
Travers</Actor> </OldVideo> <OldVideo>
<Title>Gone With the
Wind</Title>
<Year>1939</Year>
<Actor>Vivian Leigh</Actor>
<Actor>Clark Gable</Actor>
<Actor>Olivia de
HavilIand</Actor>
<Actor>Hattie
McDanieK/Actor>
<Actor>Leslie
Howard</Actor> </OldVideo> <NewVideo>
<Title>Fight Club</Title>
<Year>1999</Year>
<Actor>Edward
Norton</Actor>
<Actor>Brad Pitt</Actor>
<Actor>Helena Bonham-
Carter</Actor>
<Actor>Meat Loaf</Actor>
Figure imgf000031_0001
[0047] Discussion of "Fill Orders" and Models
A model relates document constructs and fill order information with semantic concepts. The model examples in this document use paths through the metadata to identify document constructs, although other approaches are possible. If a model has two constructs associated with the same semantic concept, the constructs can be mapped. Special annotations in the paths in the model indicate how to relate instances of a construct in a real document based on the meta-data to the semantic concept. These annotations are part of the semantic model. From the annotations in the models for the source and target documents, the "fill order" needed for transforming the source to the target document is clear. In the examples in the following section, a "*" character in a model entry indicates that each instance of the corresponding construct in a real document is an instance of the semantic concept. Thus, if a "*" appears in both the source and the target model entry, the model specifies a "for each" fill order. A "for each" fill order means that a transform is to copy every instance of the construct in the source to the target. Other fill order information in the model is shown in square brackets in a model entry. Thus, from the semantic concepts related to the source and target documents, the mapping and fill order can be derived.
Generating a transform from a model requires knowing the order to apply the model entries. In the following examples, transforms are target-centric, so model entries are sorted from the shortest to the longest target-side entries, and within that, shortest to longest source-side entries. "Fill Order" information in the model is applied left to right from the model entries.
Example 1
The concept of a fill order when transforming between the source and target documents of Example 1 is meaningless because there are no iterative constructs in the source document of this example.
Examples 2 and 3
Only Example 3 is discussed, as Example 2 is really a simplification of Example 3.
The meta-data is:
Figure imgf000032_0001
[0048] A semantic model for the source and target documents of this example would specify that the construct or element Sales/Movie is to be mapped to the construct TransactionSummaryΛlideo and that the construct Sales/Movie/Sold is to be mapped to the construct TransactionSummaryΛ ideolSales. The processing constraints or fill order that a transformation in transforming between the source and target documents of this example must satisfy include:
1. For each <Movie> element create a new <Video> element; and
2. For each <Sold> element create a new <Sales> element. The model for the meta-data and fill order is:
Figure imgf000032_0002
Figure imgf000033_0001
The model above relates <SaleslMovie> and <TransactionSummaryΛlideo> with the same semantic concept. Both of the model entries end in a "*", so the fill order is: for each <Movie> create a <Video>. The model similarly relates <Sales/Movie/Sold> and <TransactionSummaryΛlideolSales>. Because <SaleslMovie> and <TransactionSummaryΛlideo> have already been mapped and a fill order specified, the "*" character after <Movie> and <Video> can be ignored at this point. However, <Sold> and <Sales> each have a "*" after them, so they are mapped with a "for each" fill order. The end result is the mapping and fill order needed to transform the source to the target. [0049] Example 4 The meta-data is:
Figure imgf000033_0002
[0050] A semantic model for the source and target documents of this example would specify that the construct or element Catalog UpdateAlideo is to be mapped to the construct StarListlStar and Catalog UpdateΛlideolActor \s also to be mapped to the construct StarListlStar.
The fill order or processing constraint that a transformation for transforming between the source and target documents of this example must satisfy includes:
For every <Video> for every <Actor> in the original document, create a new
<Star> element.
The model for the meta-data and fill order is:
Figure imgf000033_0003
Figure imgf000034_0001
The fill order for this mapping can be described with a single source and a single target model entry, even though the fill order is complex. The model entries are traversed from left to right to identify mappings and fill order/processing constraints, <CatalogUpdateΛlideo> in the source and <StarListlStar> in the target each have a "*", so they are mapped with a "for each" fill order. The fill order instruction "|Target:GroupBy:Title,Year]" applies only to the target, so is ignored here. Since <CatalogUpdateΛlideolActor> in the source has a "*" in the model but has not been mapped nor a fill order specified, and the target has no more iterating constructs to match it to, <CatalogUpdateΛlideolActor> is matched to the rightmost iterating construct in the target which is, <StarListlStar> and the fill order is "for each". The model thus specifies the mappings and "fill order" a transformation must satisfy in transforming data between the source and target documents. [0051] Example 5 The meta-data is:
Figure imgf000034_0002
[0052] A semantic model for the source and target documents of this example would specify that the construct or element StarListlStar is to be mapped to the constructs CatalogUpdateΛlideo Catalog UpdateNideol Actor. The fill order or processing constraints that a transformation for transforming between the source and target documents of this example must satisfy include: For every <Star>, create a new <Video> record if the <Title> or <Year> holds a different value than the in the previous <Star>, and create a new <Actor> record. The model for the meta-data and fill order is:
Figure imgf000034_0003
Figure imgf000035_0001
As in the previous example, but with source and target reversed, the model specifies that the element <StarListlStar> be mapped with two elements viz, <CatalogUpdatel\lideo> and <CatalogUpdateΛlideolActor>. The model also specifies the special fill order instruction "|Target:GroupBy:Title,Year]", which mean; that on the target side, the fill order processing "groups by" the <CatalogUpdateΛlideolTitle> and <CatalogUpdateAlideolYear> elements. In other words, a transformation must group all the <CatalogUpdateΛlideo> records that have identical values for the <Title> and <Year> elements. Accordingly, a new <Video> is created only when the group by condition of an element is different than the group by condition of the previous element. However, a new <Actor> element i; always created. Thus the model satisfies the fill order constraints needed for this transform.
[0053] Example 6 The meta-data is:
Figure imgf000035_0002
[0054] A semantic model for the source and target documents of this example would specify that the construct or element StarListlStar is to be mapped to the construct ThreeListslActor, the construct StarListlStar is to be mapped to the construct ThreeLists/Title and that the construct StarListlStar is to be mapped to the construct ThreeListslYear.
The fill order is: Make three passes over the source document, such that in pass one: for each <Star>, generate an <Actor>; in pass two: For each <Star>, generate a <Title>; and in pass three: For each <Star>, generate a <Year>.
The model for the meta-data and fill order is:
Figure imgf000036_0001
The model specifies three semantic matches between the source and the target. Iterating construct <StarListlStar> maps to <ThreeListslActor> with a "for each" fill order. <StarListlStar> maps to <ThreeLists/Title> with a "for each" fill order. <StarListlStar> maps to <ThreeListslYear> with a "for each" fill order. If the transfor is processed in a target-centric fashion, top to bottom left to right in terms of the targ document meta-data and not the source document meta-data, then the transform cannot generate all three target iterating constructs simultaneously, since the three < independent of each other. Thus, the first must be processed, then the second, thei the third. Thus the model produces the mapping and fill order needed to transform 1 source to the target. [0055] Example 7 The meta-data is:
Figure imgf000036_0002
[0056] A semantic model for the source and target documents of this example would specify that the construct or element ThreeListslActor is to be mapped to the construct StarListlStar, the construct ThreeLists/Title is to be mapped to the constru StarListlStar and that the construct ThreeListslYear 'is to be mapped to the construe StarListlStar. The fill order is: For the each unique "triple" of <Actor>, <Title> and <Year>, create a new <Star>. The model for the meta-data and fill order is:
Figure imgf000036_0003
Figure imgf000037_0001
The model specifies mappings from <ThreeListslActor>, <ThreeListslTitle> and <ThreeListslYear> in the source document to <StarListlStar> in the target documenl each with a "for each" fill order. The three source constructs are independent of eacl other, and the transformation is target-centric, thus all three maps are processed simultaneously as three lists in lock step with each other, producing the fill order of the transformation. [0057] Example 8 The meta-data is:
Figure imgf000037_0002
[0058] A semantic model for the source and target documents of this example would specify that the constructs or elements AgeCatalogUpdatelOldVideo and AgeCatalogUpdatelNewVideo is to be mapped to the construct CatalogUpdateAlideo and the constructs AgeCatalogUpdatelOldVideolActor and AgeCatalogUpdatelNewVideolActor s to be mapped to the construct Catalog UpdateAlideolActor. The fill order is:
1. For each <OldVideo> create a new <Video>; a. For each <OldVideolActor> create a new <Actor>;
2. For each <NewVideo> create a new <Video>; a. For each <NewVideolActor> create a new <Actor>
The model for the meta-data and fill order is:
Figure imgf000038_0001
The model entries for the video semantic concepts are applied before the model entries for the actor semantic concepts because the target-side construct <CatalogUpdatel\lideo> is higher up in the meta-data than <CatalogUpdateΛlideolActor>. For each semantic concept, either of the source- side constructs can be matched first because neither source-side construct is under the other. For the video semantic concept, the independent source-side constructs <AgeCatalogUpdatel OldVideo> and <AgeCatalogUpdatelNewVideo> are both mapped to <CatalogUpdatel\lideo> in the target. The fill order for both is "for each" with the "Source:Excl" (exclusive) modifier, so that each is processed separately, one after the other, unlike the parallel-list "for each" processing of example 7. For the actor semantic concept, the model specifies that the source elements <AgeCatalogUpdatelOldVideolActor> and <AgeCatalogUpdatelNewVideolActor> are both mapped to the target element <CatalogUpdateAlideolActor>. Because <AgeCatalogUpdatelOldVideo> and <AgeCatalogUpdatelNewVideo> were already mapped to <CatalogUpdatel\lideo>, they are not affected by applying the actor semantic concept's model entries, and thus the "[Target:GroupBy;TitleNear]" fill-order instruction does not apply. Thus, each of the actor mappings have a "for each" fill order. Applying the video and actor semantic concept entries produces the mapping and "fill order" necessary for the transform. [0059] Note that any special processing that is required for the transform in moving data from a <Title>, <Actor> or <Year> element from the source to the target must be specified. Since in these examples transforms are target-centric, any special processing rules for data values are also target-centric. One way to resolve this problem is to create a special "clone" of <Catalog(JpdateAlideo> in the target, and map the source's <NewVideo> and any children of <NewVideo> to the clone. [0060] Example 9 The meta-data is:
Figure imgf000039_0001
[0061] A semantic model for the source and target documents of this example w< specify that the construct or element CatalogUpdateΛlideo is to be mapped to the constructs AgeCatalogUpdatelOldVideo and AgeCatalogUpdatelNewVideo, and thε the construct CatalogUpdateΛlideolActor is to be mapped to the constructs AgeCatalogUpdatelOldVideolActor and AgeCatalogUpdatelNewVideo I Actor The fill order is:
1. For each <Video> element, if the <Year> value is less than or equal to 1970, create an <OldVideo> element a. For every <Actor> create a new <Actor>;
2. For each <Video> element, if the <Year> value is greater than 1970, create a <NewVideo> element a. For every <Actor> create a new <Actor> The model for the meta-data and fill order is:
Figure imgf000040_0001
In accordance with the model, <CatalogUpdateAlideo> is mapped to <AgeCatalogUpdatelOldVideo> and <CatalogUpdatel\lideo> is mapped to <AgeCatalogUpdatelNewVideo>.
[0062] Embodiments of the present invention allow conditional mappings i.e. mappings in which a target element is created if and only if the condition is met. In this example, there is a conditional mapping in that only some of the <Video> elements in the source document are copied. The fill order from <CatalogUpdatellideo> to <AgeCatalogUpdatelOldVideo> is "for each" modified by the clause "Target:Cond:Year <= 1970" in the model. Thus, an <OldVideo> construct will be created only if the value in its <Year> field is less than or equal to 1970. The mapping from <CatalogUpdatel\lideo> to <AgeCatalogUpdatelNewVideo> similarly has a "for each" with a condition that the value in the <Year> field is greater than 1970.
[0063] In this example, <CatalogUpdateAlideolActor> is mapped to both <AgeCatalogUpdatelOldVideolActor> and
<AgeCatalogUpdatelNewVideolActor>, each with a "for each" fill order. The <OldVideo> or <NewVideo> in the target document are not re-mapped to <Video> in the source document since the processing of these two mappings is dependent on its parent mappings. Thus, the model specifies the mapping and "fill order" needed to transform the source document to the target. Example 10
[0064] In this example, the source document comprises a list of videos, where a video can have multiple actors. The source document is to be converted into a target document comprising a list of actors wherein each actor could have starred in multiple videos. The meta-data is:
Figure imgf000041_0001
A semantic model for the source and target documents for this example would specify that the construct or element CatalogUpdateAlideolActor is to be mapped to the construct GroupedStarListlActor and that the construct CatalogUpdateΛlideo is to be mapped to the construct GroupedStarListlActor /Movie.
[0065] The fill order is: For every <CatalogUpdateΛlideolActor> in the source, first create a <GroupedStarListlActor> element. Then create a single <Movie> element. Populate the <ActorName>, <Movie/Title> and <MovielYear> elements with data from the source. Reorganize the target document, to group all the <Movie> elements under the correct <Actor>. The model for the meta-data and fill order is:
Figure imgf000041_0002
[0066] To apply the model, the actor semantic concept is applied first, because its target model entry is the parent of the video target model entry. In applying the actor model entries, <CatalogUpdateA/ideo> and <CatalogUpdateΛlideolActor> in the source document both map to <GroupedStarListlActor> in the target with a "for each" fill order that groups the target groups by the values in <ActorName>. In applying the video semantic concept, <GroupedStarListlActor> in the target is already mapped, but <GroupedStarListlActor/Movie> is not. In the source, <CatalogUpdateAlideo> has already been mapped. Because <CatalogUpdateΛlideo> is an ancestor of <CatalogUpdate/VideolActor> and remapping <CatalogUpdateΛlideo> to <GroupedStarListlActorlMovie> satisfies the video model entries and the actor model entries, it is remapped, with a "for each" fill order. The processing is done two-stages: First, the data is moved from the source document to the target document, and thereafter it is organized in terms of the group-by clause. [0067] It will be appreciated that modeling the meta-data of a document once allows the document to be mapped to multiple independent target or multiple independent source documents. By simple extension of the techniques described above, additional documents can be modeled and mapped. [0068] Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that the various modifications and changes can be made to these embodiments without departing from the broader spirit of the invention as set forth in the claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than in a restrictive sense.

Claims

CLAIMSWhat is claimed is:
1. A method for transforming a source document in a source format to a target document in a target format, the method comprising: deriving a semantic model for iterative constructs in the source and target documents including defining processing constraints for each iterative construct in the source document that a transformation for transforming the construct to its equivalent construct in the target document must satisfy; and automatically generating each transformation based on the semantic model.
2. The method of claim 1 , wherein an iterative construct comprises a set of semantically related elements, at least one of which iterates within the construct.
3. The method of claim 2, wherein each iterative construct itself iterates within the document in which it occurs.
4. The method of claim 3, wherein deriving the semantic model comprises disambiguating duplicate element names within an iterative construct.
5. The method of claim 4, wherein disambiguating a duplicate element name comprises renaming the duplicate element names.
6. The method of claim 4, wherein disambiguating a duplicate element name comprises uniquely identifying each instance of the element name in the semantic model.
7. The method of claim 3, wherein the processing constraints comprise constraints on an order that a transformation processes the elements within an iterative construct in the source document.
8. The method of claim 3, wherein the processing constraints comprise conditions for a transformation to create a new element within an iterative construct in the target document.
9. A method for transforming a source document in a source format to a target document in a target format, the method comprising: identifying a first set of semantically related elements in the source document, wherein at least one of the elements iterates; identifying a second set of semantically related elements in the target document, wherein at least one of the elements iterates, and wherein the first and second sets are semantically equivalent; deriving a model which specifies which elements in the first and second sets are semantically equivalent including defining processing rules which a transformation to transform data from the first set to data into the second set must satisfy.
10. The method of claim 9, wherein deriving the model comprises uniquely identifying duplicate elements in the first and second sets.
11. The method of claim 9, wherein uniquely identifying duplicate elements comprises assigning a unique name to each duplicate element.
12. The method of claim 9, wherein uniquely identifying duplicate element names comprises identifying duplicate element names based on a hierarchy of each duplicate element name within a set.
13. The method of claim 9, wherein the processing rules specify an order in which the elements in the first set must be processed by the transformation.
14. The method of claim 9, wherein the processing rules specify conditions for creating new elements in the second set.
15. The method of claim 9, further comprising automatically generating the transformation based upon the model.
16. A system for transforming a source document in a source format to a target document in a target format, the system comprising a processor and a memory coupled thereto, the memory storing instructions which when executed by the processor cause the processor to perform a method comprising: deriving a semantic model for iterative constructs in the source and target documents including defining processing constraints for each iterative construct in the source document that a transformation for transforming the construct to its equivalent construct in the target document must satisfy; and automatically generating each transformation based on the semantic model and the processing constraints.
17. The system of claim 16, wherein an iterative construct comprises a set of semantically related elements, at least one of which iterates within the construct.
18. The system of claim 17, wherein each iterative construct itself iterates within the document in which it occurs.
19. The system of claim 18, wherein deriving the semantic model comprises disambiguating duplicate element names within an iterative construct.
20. The system of claim 19, wherein disambiguating a duplicate element name comprises renaming the duplicate element names.
21. The system of claim 19, wherein disambiguating a duplicate element name comprises uniquely identifying each instance of the element name in the semantic model.
22. The system of claim 18, wherein the processing constraints comprise constraints on an order that a transformation processes the elements within an iterative construct in the source document.
23. The system of claim 18, wherein the processing constraints comprise conditions for a transformation to create a new element within an iterative construct in the target document.
24. A system for transforming a source document in a source format to a target document in a target format, the system comprising a processor and a memory coupled thereto, the memory storing instructions which when executed by the processor cause the processor to perform a method comprising: identifying a first set of semantically related elements in the source document, wherein at least one of the elements iterates; identifying a second set of semantically related elements in the target document, wherein at least one of the elements iterates, and wherein the first and second sets are semantically equivalent; deriving a model which specifies which elements in the first and second sets are semantically equivalent including defining processing rules which a transformation to transform data from the first set to data into the second set must satisfy.
25. The system of claim 24, wherein deriving the model comprises uniquely identifying duplicate elements in the first and second sets.
26. The system of claim 25, wherein uniquely identifying duplicate elements comprises assigning a unique name to each duplicate element.
27. The system of claim 25, wherein uniquely identifying duplicate element names comprises identifying duplicate element names based on a hierarchy of each duplicate element name within a set.
28. The system of claim 24, wherein the processing rules specify an order in which the element in the first set must be processed by the transformation.
29. The system of claim 24, wherein the processing rules specify conditions for creating new elements in the second set.
30. The system of claim 24, wherein the method further comprises automatically generating the transformation based upon the model.
31. A computer readable medium having stored thereon a sequence of instructions which when executed by a computer cause the computer to perform a method comprising: deriving a semantic model for iterative constructs in the source and target documents including defining processing constraints for each iterative construct in the source document that a transformation for transforming the construct to its equivalent construct in the target document must satisfy; and automatically generating each transformation based on the semantic model.
32. The computer-readable medium of claim 31 , wherein an iterative construct comprises a set of semantically related elements, at least one of which iterates within the construct.
33. The computer-readable medium of claim 32, wherein each iterative construct itself iterates within the document in which it occurs.
34. The computer-readable medium of claim 31 , wherein deriving the semantic model comprises disambiguating duplicate element names within an iterative construct.
35. The computer-readable medium of claim 34, wherein disambiguating a duplicate element name comprises renaming the duplicate element names.
36. The computer-readable medium of claim 34, wherein disambiguating a duplicate element name comprises uniquely identifying each instance of the element name in the semantic model.
37. The computer-readable medium of claim 33, wherein the processing constraints comprise constraints on an order that a transformation processes the elements within an iterative construct in the source document.
38. The computer-readable medium of claim 33, wherein the processing constraints comprise conditions for a transformation to create a new element within an iterative construct in the target document.
39. The computer-readable medium having stored thereon a sequence of instructions which when executed by a computer, cause a computer to perform a method comprising: identifying a first set of semantically related elements in the source document, wherein at least one of the elements iterates; identifying a second set of semantically related elements in the target document, wherein at least one of the elements iterates, and wherein the first and second sets are semantically equivalent; deriving a model which specifies which elements in the first and second sets are semantically equivalent including defining processing rules which a transformation to transform data from the first set to data into the second set must satisfy.
40. The computer-readable medium of claim 39, wherein deriving the model comprises uniquely identifying duplicate elements in the first and second set.
41. The computer-readable medium of claim 40, wherein uniquely identifying duplicate elements comprises assigning a unique name to each duplicate element.
42. The computer-readable medium of claim 40, wherein uniquely identifying duplicate element names comprises identifying duplicate element names based on a hierarchy of each duplicate element name within a set.
43. The computer-readable medium of claim 39, the processing rules specify an order in which the element in the first set must be processed by the transformation.
44. The computer-readable medium of claim 39, wherein the processing rules specify conditions for creating new elements in the second set.
45. The computer-readable medium of claim 39, wherein the method further comprises automatically generating the transformation based upon the model.
46. A system for transforming a source document in a source format to a target document in a target format, the system comprising: means for deriving a semantic model for iterative constructs in the source and target documents including defining processing constraints for each iterative construct in the source document that a transformation for transforming the construct to its equivalent construct in the target document must satisfy; and means for automatically generating each transformation based on the semantic model and the processing constraints.
47. A system for transforming a source document in a source format to a target document in a target format, the system comprising: means for identifying a first set of semantically related elements in the source document, wherein at least one of the elements iterates; means for identifying a second set of semantically related elements in the target document, wherein at least one of the elements iterates, and wherein the first and second sets are semantically equivalent; means for deriving a model which specifies which elements in the first and second sets are semantically equivalent including defining processing rules which a transformation to transform data from the first set to data in the second set must satisfy.
PCT/US2003/021624 2002-07-22 2003-07-11 Method and system for transforming semantically related documents WO2004010325A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003247971A AU2003247971A1 (en) 2002-07-22 2003-07-11 Method and system for transforming semantically related documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20091102A 2002-07-22 2002-07-22
US10/200,911 2002-07-22

Publications (1)

Publication Number Publication Date
WO2004010325A1 true WO2004010325A1 (en) 2004-01-29

Family

ID=30769577

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/021624 WO2004010325A1 (en) 2002-07-22 2003-07-11 Method and system for transforming semantically related documents

Country Status (2)

Country Link
AU (1) AU2003247971A1 (en)
WO (1) WO2004010325A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7454764B2 (en) 2005-06-29 2008-11-18 International Business Machines Corporation Method and system for on-demand programming model transformation
US10722607B2 (en) 2006-08-05 2020-07-28 Givaudan S.A. Perfume compositions

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002019154A1 (en) * 2000-08-29 2002-03-07 Contivo, Inc. Virtual groups

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002019154A1 (en) * 2000-08-29 2002-03-07 Contivo, Inc. Virtual groups

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BILL FRENCH: "Enterprise Integration Modeling", ACTIONLINE, January 2002 (2002-01-01), pages 26 - 28, XP002260688, Retrieved from the Internet <URL:http://www.contivo.com/news/articles/EIM_ActionLine.pdf> [retrieved on 20031107] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7454764B2 (en) 2005-06-29 2008-11-18 International Business Machines Corporation Method and system for on-demand programming model transformation
US10722607B2 (en) 2006-08-05 2020-07-28 Givaudan S.A. Perfume compositions

Also Published As

Publication number Publication date
AU2003247971A1 (en) 2004-02-09

Similar Documents

Publication Publication Date Title
CN111753499B (en) Method for merging and displaying electronic form and OFD format file and generating directory
TWI393051B (en) Management and use of data in a computer-generated document
US7711754B2 (en) System and method for managing data using static lists
US7363581B2 (en) Presentation generator
US6279005B1 (en) Method and apparatus for generating paths in an open hierarchical data structure
US20060218160A1 (en) Change control management of XML documents
US20100192057A1 (en) Method and apparatus for generating an integrated view of multiple databases
US6915303B2 (en) Code generator system for digital libraries
US20080120333A1 (en) Generic infrastructure for migrating data between applications
Bowen Getting started with talend open studio for data integration
CN106897437B (en) High-order rule multi-classification method and system of knowledge system
US20070288425A1 (en) Complex data assembly identifier thesaurus
US20120046937A1 (en) Semantic classification of variable data campaign information
Glushko et al. Document engineering for e-business
EP3635580A1 (en) Functional equivalence of tuples and edges in graph databases
US20040083135A1 (en) Electronic catalogue
US20140379720A1 (en) Accessing stored electronic resources
US7546526B2 (en) Efficient extensible markup language namespace parsing for editing
WO2004010325A1 (en) Method and system for transforming semantically related documents
US6910051B2 (en) Method and system for mechanism for dynamic extension of attributes in a content management system
CN108268436B (en) Method and device for beautifying and matching slides
Duta et al. ConvRel: relationship conversion to XML nested structures
Vestberg Picture Research: The Work of Intermediation from Pre-Photography to Post-Digitization
WO2023132341A1 (en) Drawing search device, drawing database construction device, drawing search system, drawing search method, and recording medium
US20230222098A1 (en) A fractal geometry or bio-inspired system for complex file organization and storage

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP