WO2020172650A1 - Computer systems and methods for database schema transformations - Google Patents

Computer systems and methods for database schema transformations Download PDF

Info

Publication number
WO2020172650A1
WO2020172650A1 PCT/US2020/019441 US2020019441W WO2020172650A1 WO 2020172650 A1 WO2020172650 A1 WO 2020172650A1 US 2020019441 W US2020019441 W US 2020019441W WO 2020172650 A1 WO2020172650 A1 WO 2020172650A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
database
substructure
accordance
formatted
Prior art date
Application number
PCT/US2020/019441
Other languages
French (fr)
Inventor
James Best
David PULLIN
Original Assignee
Paat, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Paat, Inc. filed Critical Paat, Inc.
Publication of WO2020172650A1 publication Critical patent/WO2020172650A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/213Schema design and management with details for schema evolution support

Definitions

  • This disclosure relates generally to computer systems that utilize one or more databases and methods of operating the same.
  • Databases are often used to store and collect digital data in a manner that can be easily accessed and updated. To do this, the digital data stored in the database has to be organized in a manner that allows a client program to find and write digital data to the database. In many circumstances, this is done with database schemas. More specifically, database schemas provide data archetypes for things (e.g., materials, devices, places, sale items) or operations (e.g., steps in a manufacturing process, directions to a location, shipping of sale items, monitored events) that are being modeled by the database.
  • things e.g., materials, devices, places, sale items
  • operations e.g., steps in a manufacturing process, directions to a location, shipping of sale items, monitored events
  • certain manufacturing equipment may receive physical inputs (i.e, materials or devices received by the manufacturing equipment) and create physical outputs (i.e., materials or devices created by the manufacturing process implemented by the manufacturing equipment) from the physical inputs.
  • Data inputs and data outputs may be used to model information regarding those physical inputs and physical outputs, respectively.
  • These data inputs and data outputs may be stored and retrieved from a database.
  • the database may organize these data inputs and data outputs in accordance with database schemas that define data archetypes for modeling the physical inputs and the physical outputs.
  • the data outputs of one piece of manufacturing equipment need to become the data inputs for another piece of manufacturing equipment.
  • different pieces of manufacturing equipment often include computer systems that use different incompatible software.
  • the database schemas for one piece of manufacturing equipment may be defined in a different data description language than the database schemas of another piece of manufacturing equipment along the assembly line. Dealing with these incompatible database schemas is often difficult.
  • databases and client programs often assume incompatible database schemas thereby preventing client programs from exchanging data with the databases since the client program and/or the databases will not be able to understand request and data organized in accordance with incompatible database schemas and/or using incompatible data description languages.
  • a computer system performs a machine learning method. More specifically, one or more databases may be provided, wherein the database(s) store data structures formatted in accordance with database schemas and each of the database(s) includes at least one of the data structures formatted in accordance with at least one of the database schemas.
  • the database schemas can use tables and foreign key connections to describe the structure of data in the database.
  • the computer system implements a machine learning network to identify a plurality of equivalent data substructures in the data structures defined by the database schemas.
  • the database then constructs a name value type hierarchy (NVTH) that includes data types corresponding to one or more of the plurality of equivalent data substructures identified by the machine learning network. Accordingly, the machine learning network can be used to learn equivalent data substructures in incompatible database schemas in an automated fashion.
  • NVTH name value type hierarchy
  • the computer system performs a schema transformation method. More specifically, the computer system receives a first data substructure formatted in accordance with a first data structure. The first data substructure corresponds with a first data type of the NVTH. The computer system may then transform a data item based on the first data substructure into an equivalent second data substructure formatted in accordance with a second database schema that is different from the first database schema, wherein the second data substructure also corresponds with the first data type of the NVTH. In this manner, the NVTH can be utilized to translate data from one database schema to another incompatible database schema without requiring a specialized and complex computer program. It should be noted that, in some implementations, the data item is the first data substructure itself. In other implementations, the data item may be the output of a function that uses the first data substructure as input.
  • Figure 1 illustrates one embodiment of a computer system configured to implement a machine learning network that constructs an NVTH that can be utilized to translate data between incompatible database schemas.
  • Figure 2 visually represents exemplary data involved in the construction of a data type of the NVTH.
  • Figure 3A illustrates a detailed example of one of the database structures and one of the database schemas defined using JSON.
  • Figure 3B illustrates a detailed example of one of the database structures and one of the database schemas defined using XML.
  • Figure 4 visually represents exemplary data used to perform schema transformations with the NVTH.
  • Figure 5 illustrates exemplary procedures of a machine learning method.
  • Figure 6 illustrates exemplary procedures of a schema transformation method.
  • Figure 7 visually represents exemplary data involved in the construction of NVTFIs from different data structures formatted in different data description languages.
  • Figure 8A illustrates a detailed example of an NVTFI constructed from JSON data structures.
  • Figure 8B illustrates a detailed example of an NVTFI constructed from XML data structures.
  • Figure 9 illustrates exemplary procedures of another machine learning method.
  • a database is a recognized database, in a recognized structure (e.g., tables).
  • the computational systems and methods described herein canonicalize incompatible database schemas to construct a Name Value Type Hierarchy (NVTFI).
  • NVTFI Name Value Type Hierarchy
  • the NVTFI defines data types assigned to semantically equivalent components of the database schemas.
  • the NVTFI can then be used by an Application Programming Interface (API) to transform data formatted in one database schema into semantically equivalent data formatted in accordance with an incompatible database schema.
  • API Application Programming Interface
  • the API may encapsulate the data in accordance with a corresponding data type of the NVTFI and then transform the encapsulated data into the semantically equivalent data of the other database schema.
  • a client application program may send read or write requests for data in accordance with its database schema even though the database schema of the database storing the data is incompatible with the database schema of the client application program.
  • the API can also perform a function on input data from a client application program and transform the resulting output data into semantically equivalent data formatted in accordance with the incompatible database schemas of the database.
  • the NVTFI can, in addition, allow the API to transform output data from a function performed by the client application program.
  • the API can also use the NVTFI to transform data formatted in accordance with the database schema of one database and transform the data into semantically equivalent data formatted in accordance with an incompatible database schema of another database.
  • the systems and methods are particularly useful when dealing with database schemas defined in different data description languages. More specifically, the data types of the NVTFI may be canonicalizations of equivalent components of database schemas defined in different data description languages. In some embodiments, this allows the API to encapsulate the data regardless of the data description language used to define the database schema. This encapsulation, in effect, allows the API to "observe” the semantics of the data without being impeded by syntax and the specifics of the data description languages. Schema transformation rules may be used by the API to map input and output data in the database schemas to the NVTH and provide schema transformations to and from the database schemas. The systems and methods thus improve the operation of computer systems by allowing databases and client application programs that use heterogeneous and incompatible database schemas to exchange and manipulate data without regard to the idiosyncrasies of the particular database schemas and/or the data description languages.
  • the systems and methods described herein may utilize machine learning networks implemented by a computer system to canonicalize the database schemas in order to construct the NVTH, as explained in further detail below.
  • This is a significant advantage to previously known systems.
  • the machine learning techniques described herein can determine the relationships for database schema transformations regardless of the amount of things, operations, actions, and services being modeled and the variety of data description languages used to model them.
  • constructing an NVTH that canonicalizes the database schemas provides a generalized solution that does not require a human to work out complex transformations between the database schemas.
  • the systems and methods have a wide application anywhere in the computer industry where incompatible database schemas are a problem.
  • One particularly important application for the systems and methods described herein relates to computer systems utilized in manufacturing facilities.
  • manufacturing facilities often have an assembly line with manufacturing equipment created by different manufacturers.
  • Each piece of manufacturing equipment may include a computer system that communicates with a database in order to receive input information regarding the materials or devices received by the equipment and to transmit information regarding the materials or devices output from the manufacturing equipment.
  • input data is received by the manufacturing equipment regarding the materials and/or devices received by the equipment and output data is transmitted to the database regarding the output materials and/or devices output from the manufacturing equipment.
  • This input data and output data often model different aspects of the input and output materials and/or devices in accordance with a database schema.
  • the other manufacturing equipment in the assembly line may use different database schemas in order to model its input and output materials and/or devices.
  • the same devices and materials may be modeled in accordance with different and incompatible database schemas. This is particularly problematic when different data description languages are used to create the database schemas and when the client application programs for each piece of manufacturing equipment uses different languages to make queries and requests.
  • machine learning networks can be used to identify a plurality of equivalent data substructures defined by different database schemas and create the NVTH, as explained in further detail below.
  • the NVTH can then be used by the API to identify the equivalent data substructures when both pieces of manufacturing equipment are operating. This allows for the various pieces of manufacturing equipment to communicate input and output data regardless of the database schemas used to model the input and output materials and/or devices.
  • An NVTH is an extension of a name value hierarchy (NVH), where "type” describes a piece of hierarchy.
  • type describes a piece of hierarchy.
  • the self-describing nature of the "T” runs through the hierarchy of the NVTH.
  • an NVH in JSON has types (number, strings, etc.) and arrays and objects (hierarchy).
  • the "type” of the NVTH provides a hierarchy value for a data substructure along a schema.
  • the data substructure may be a piece of cloth, to polygon, to points on a polygon with regards to garment manufacturing.
  • the NVTH thus allows for data substructures in a data structure to be modularized within an overall data format provided by the NVTH.
  • NVTH provides a database schema without actually knowing anything about the formalized and digitally recorded database schema for a data object or data record.
  • the schema is the "type.” (Number, Type (Polygon, or Zipper).
  • an NVTH provides a more modular schema.
  • a computer may be a processor-controlled device, such as, by way of example, personal computers, workstations, servers, clients, minicomputers, mainframe computers, laptop computers, smartphones, tablets, a network of one or more individual computers, mobile computers, portable computers, handheld computers, palm-top computers, set-top boxes for a TV, interactive televisions, interactive kiosks, personal digital assistants, interactive wireless devices, or any combination thereof.
  • processor-controlled device such as, by way of example, personal computers, workstations, servers, clients, minicomputers, mainframe computers, laptop computers, smartphones, tablets, a network of one or more individual computers, mobile computers, portable computers, handheld computers, palm-top computers, set-top boxes for a TV, interactive televisions, interactive kiosks, personal digital assistants, interactive wireless devices, or any combination thereof.
  • a computer may be a uniprocessor or multiprocessor machine. Accordingly, a computer may include one or more processors and, thus, the aforementioned computer system may also include one or more processors. Examples of processors include sequential state machines, microprocessors, microcontrollers, graphics processing units (GPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems on a chip (SoC), baseband processors, field-programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
  • processors include sequential state machines, microprocessors, microcontrollers, graphics processing units (GPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems on a chip (SoC), baseband processors, field-programmable gate arrays (FPGAs), programmable logic devices (PL
  • the computer may include one or more memories. Accordingly, the aforementioned computer systems may include one or more memories.
  • a memory may include a memory storage device or an addressable storage medium which may include, by way of example, random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), electronically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), hard disks, floppy disks, laser disk players, digital video disks, compact disks, videotapes, audio tapes, magnetic recording tracks, magnetic tunnel junction (MTJ) memory, optical memory storage, quantum mechanical storage, electronic networks, and/or other devices or technologies used to store electronic content such as programs and data.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • EEPROM electronically erasable programmable read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • the one or more memories may store computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to implement the procedures and techniques described herein.
  • the one or more processors may be operably associated with the one or more memories so that the computer-executable instructions can be provided to the one or more processors for execution.
  • the one or more processors may be operably associated with the one or more memories through one or more buses.
  • the computer may possess or may be operably associated with input devices (e.g., a keyboard, a keypad, controller, a mouse, a microphone, a touch screen, a sensor) and output devices such as (e.g., a computer screen, printer, or a speaker).
  • the computer may execute an appropriate operating system such as LINUX®, UNIX®, MICROSOFT® WI NDOWS®, APPLE® MACOS®, IBM® OS/2®, ANDROID, and PALM® OS, and/or the like.
  • the computer may advantageously be equipped with a network communication device such as a network interface card, a modem, or other network connection device suitable for connecting to one or more networks.
  • a computer may advantageously contain control logic, or program logic, or other substrate configuration representing data and instructions, which cause the computer to operate in a specific and predefined manner as described herein.
  • the computer programs when executed, enable a control processor to perform and/or cause the performance of features of the present disclosure.
  • the control logic may advantageously be implemented as one or more modules.
  • the modules may advantageously be configured to reside on the computer memory and execute on the one or more processors.
  • the modules include, but are not limited to, software or hardware components that perform certain tasks.
  • a module may include, by way of example, components, such as software components, processes, functions, subroutines, procedures, attributes, class components, task components, object-oriented software components, segments of program code, drivers, firmware, micro-code, circuitry, data, and/or the like.
  • components such as software components, processes, functions, subroutines, procedures, attributes, class components, task components, object-oriented software components, segments of program code, drivers, firmware, micro-code, circuitry, data, and/or the like.
  • the control logic conventionally includes the manipulation of digital bits by the processor and the maintenance of these bits within memory storage devices resident in one or more of the memory storage devices.
  • memory storage devices may impose a physical organization upon the collection of stored data bits, which are generally stored by specific electrical or magnetic storage cells.
  • the control logic generally performs a sequence of computer-executed steps. These steps generally require manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, or otherwise manipulated. It is conventional for those skilled in the art to refer to these signals as bits, values, elements, symbols, characters, text, terms, numbers, files, or the like. It should be kept in mind, however, that these and some other terms should be associated with appropriate physical quantities for computer operations, and that these terms are merely conventional labels applied to physical quantities that exist within and during operation of the computer based on designed relationships between these physical quantities and the symbolic values they represent.
  • features of the computer systems can be implemented primarily in hardware using, for example, hardware components such as application-specific integrated circuits (ASICs) or field- programmable gated arrays (FPGAs). Implementation of the hardware circuitry will be apparent to persons skilled in the relevant art(s). In yet another embodiment, features of the computer systems can be implemented using a combination of both general-purpose hardware and software.
  • ASICs application-specific integrated circuits
  • FPGAs field- programmable gated arrays
  • Figure 1 illustrates one embodiment of a computer system 100 configured to operate as described above.
  • the computer system 100 includes a plurality of client computers 102.
  • Each of the client computers 102 may be a processor-controlled device, such as, by way of example, personal computers, workstations, laptop computers, smartphones, tablets, individual computers, mobile computers, portable computers, handheld computers, palm-top computers, set-top boxes for a TV, interactive televisions, interactive kiosks, personal digital assistants, interactive wireless devices, and/or the like.
  • Each of the client computers 102 may include one or more processors (not explicitly shown) and at least one memory (not explicitly shown) configured to store computer-executable instructions.
  • the client application programs 104 among the various client computers 102 may be configured to make database requests for different types of database systems and in particular use different database control languages (DCL).
  • the DCL provide the syntax used to control access to the data stored in the databases.
  • the client application programs 104 need to identify the different data entities that are the subject of the request, which are formatted in accordance to the database schemas presumed to be used by the databases to store the data.
  • one of the client application programs 104 is a Java client application program
  • another one of the client application programs 104 is an XML client application program
  • yet another one of the client application programs 104 is an SQL client application program
  • still another one of the client application programs 104 is a Codasyl client application program.
  • each of the client application programs 104 are thus configured to make read database request and receive data in response to those read database requests.
  • Each of the client application programs 104 may also be configured to provide write database requests to write data to the databases.
  • each of these client application programs 104 will make these database requests and send/receive data associated with these database requests in accordance to the particular syntax required by the database management system for which the client application programs 104 are configured.
  • each of the client computers 102 may be provided as part of different manufacturing equipment within the assembly line of a manufacturing facility. For example, within an industrial manufacturing facility, various different types of manufacturing equipment may be provided along an assembly line in order to manufacture goods. It is common for different manufacturing equipment in the manufacturing facility to come from different companies, which may be designed in accordance with varying equipment specifications. For example, different manufacturing equipment may be designed in accordance with software specifications requiring different types of database software.
  • each of the client application programs 104 may input and output data records in accordance with the requirements of different types of incompatible database programs (e.g., Java, XML, SQL, Codasyl).
  • the computer system 100 provides a solution to the incompatibility.
  • the computer system 100 includes a server computer 106, which is a computer subsystem of the computer system 100.
  • the computer system 100 also includes various databases 1 10, which are operably associated with the server computer 106.
  • Each of the databases 1 10 is configured to store different sets of data 1 12.
  • the different sets of data 1 12 have data structures that are formatted in accordance with different database schemas.
  • the set of data 1 12 stored in one of the databases 1 10 has data structures formatted in accordance with JSON database schemas.
  • the set of data 1 12 in another one of the databases 110 has data structures formatted in accordance with XML schemas.
  • the set of data 1 12 in yet another one of the databases 1 10 has data structures formatted in accordance with SQL database schemas.
  • the set of data 1 12 in still another one of the databases 1 10 has data structures formatted in accordance with Codasyl database schemas.
  • JSON, XML, SQL, and Codasyl are the data description language used to define the database schemas of each of the databases 1 10.
  • a DCL and/or a data description language may or may not be a programming language.
  • JSON is a data description language that is independent of and is not a programming language (JSON sometimes is referred to as a language-independent data format). While JSON was originally intended to be a subset of the JavaScript scripting language, it is a text-based data description language that defines database schemas using attribute-value pairs and array data types. Many programming languages include code that is capable of parsing JSON database schemas, including Java. Flowever, JSON is strictly a data description language used for formatting the database schemas of the database and is not a DCL or more generally, a programming language. On the other hand, SQL is a programming language that can be used as both the DCL and the data description language. These distinctions would be apparent to one of ordinary skill in the art.
  • the Java client application program 104 is configured to make database requests and receive data segments of the data 112 in the JSON database 1 10 where the data 1 12 is formatted in accordance with JSON schemas without further assistance. Flowever, the Java client application program 104 is unable to make database request or receive data segments in the data 1 12 formatted in accordance with XML, SQL, and Codasyl database schemas. Furthermore, the XML client application program 104 is configured to make database requests and receive data segments in the data 1 12 from the XML database 1 10 with the set of data 1 12 formatted in accordance with XML database schemas without further assistance.
  • the XML client application program 104 is unable to make database request or receive data segments in the data 112 formatted in accordance with JSON, SQL, and Codasyl database schemas. Additionally, the SQL client application program 104 is configured to make database requests and receive data segments in the data 1 12 from the SQL database 1 10 where the data 1 12 is formatted in accordance with SQL database schemas. However, the SQL client application program 104 for SQL is unable to make database request or receive data segments in the data 1 12 formatted in accordance with JSON, XML, and Codasyl database schemas.
  • the Codasyl client application program 104 is configured to make database requests and receive data segments in the data 1 12 from the Codasyl database 1 10 where the data 1 12 is formatted in accordance with Codasyl database schemas without further assistance.
  • the Codasyl client application program 104 is unable to make database requests or receive data segments in the data 1 12 formatted in accordance with JSON, XML, and SQL database schemas.
  • the server computer 106 is a computer system that is configured to learn both how the JSON, XML, SQL, and Codasyl database schemas relate to one another and, using these relations, allow each of the client application programs 104 to make requests to the databases 1 10. As explained below, the client application programs 104 can make requests to read or write data to the databases 1 10 regardless of whether the DCL of the client application program 104 is compatible with the data description language of the database schemas of the databases 1 10.
  • the server computer 106 includes one or more central processing units (CPUs) 1 14, each including one or more processors 1 16.
  • the CPU(s) 1 14 may be a master device.
  • the CPU(s) 1 14 may also have cache memory 1 18 (which is a type of memory) coupled to the processor(s) 1 16 for rapid access to temporarily stored computer-executable instructions and register values.
  • the CPU(s) 1 14 are coupled to a system bus 120 where the system bus 120 is configured to intercouple master and slave devices included in the server computer 106.
  • the system bus 120 may be a bus interconnect.
  • the CPU(s) 1 14 communicate with these other devices by exchanging address, control, computer-executable instructions and other information over the system bus 120.
  • the server computer 106 includes a memory system 122, one or more input/output devices 124, and one or more network interface devices 126.
  • the input/output device(s) 124 can include any type of input/output device including keyboards, displays, touchscreens, switches, microphones, speakers, and/or the like.
  • the network interface device(s) 126 can be any device configured to allow data exchange between the server computer 106 and the client computers 102, and between the server computer 106 and the databases 110.
  • the client computers 102, the server computer 106, and the databases 1 10 may all be part of a computer network, including but not limited to a wired or wireless network, private or public network, a local area network (LAN), a wide local area network (WLAN), and/or the Internet.
  • the network interface device(s) 126 can be configured to support any type of communication protocol desired used by different types of computer networks.
  • the memory system 122 can include one or more memories 128 that are configured to store computer-executable instructions 130.
  • the computer-executable instructions 130 may be loaded through the system bus 120 to the cache memory(ies) 1 18 of the CPU(s) 114.
  • the processor(s) 1 16 of the CPU(s) 1 14 are configured to execute the computer-executable instructions 130.
  • the computer-executable instructions 130 cause the processor(s) 1 16 to implement a machine learning network 132, which may be used to construct an NVTH 134.
  • the NVTH 134 is a canonicalization of the heterogeneous database schemas used to organize the data 1 12 in the various databases 1 10.
  • the computer-executable instructions 130 also cause the processor(s) 1 16 of the CPU(s) 1 14 to implement an API 136, which uses the NVTH 134 to allow the client application programs 104 to communicate with the databases 1 10 regardless of the data description language used to define the database schemas of the databases 1 10.
  • Figure 2 visually represents exemplary data involved in the construction of a data type of the NVTH 134.
  • the NVTH 134 shown in Figure 2 is constructed by the server computer 106 using the machine learning network 132 shown in Figure 1.
  • the construction of the NVTH 134 is described with respect to the JSON database 1 10 and the XML database 1 10. This is for the sake of simplicity and clarity.
  • the same principles and techniques described herein are applicable to construct an example of the NVTH 134 that canonicalizes the data 1 12 of any combination of the databases 1 10.
  • the described techniques are equally applicable to data 1 12 formatted in accordance to database schemas defined in accordance to other types of data description languages such as Python, PHP, Ruby, Perl, just to name a few.
  • the data 112 does not necessarily have to be on separate databases 1 10.
  • some databases 1 10 are capable of storing data 1 12 formatted in accordance with database schemas defined in different data description languages in separate domains.
  • the described techniques can be utilized whenever there are database schemas that provide models that are not mutually exclusive but that are schematically incompatible.
  • Figure 2 illustrates that the data 1 12 is organized as data structures 200 formatted in accordance with database schemas 202 defined in JSON.
  • Each of the data structures 200 is the portion of the data 1 12 formatted in accordance with a different one of the database schemas 202.
  • the details of one of the data structures 200 and one of the database schemas 202 are shown in Figure 3A.
  • the database schema 202 that is shown in detail provides a JSON database model for pieces of fabric that are utilized to construct a shirt.
  • Database schemas defined in JSON are text based and may describe a data object archetype by relating properties and arrays to strings and values.
  • Figure 3A illustrates a detailed example of one of the database structures 200 and one of the database schemas 202.
  • the database schema 202 that is shown in detail in Figure 3A is a data object archetype, named "Shirt Fabric 1
  • the database schema 202 for "Shirt Fabric 1” defines a string named "design” to identify the design of a shirt and eight different (complex) properties named, "left front panel,” “left sleeve,” “left cufflink,” “right front panel,” “right sleeve,” “right cufflink,” “back panel,” “collar,” and which model the corresponding fabric pieces used to build the different portions of the shirt (as named).
  • Each of these properties is complex and is associated with an integer value for a "bar code ID” (corresponding to a number on an attached bar code) and an array of integers named “measurement” that describe measurements in cm associated with each of the properties.
  • the data structure 200 i.e., the data structure 1 shown in detail in Figure 3A includes each of the data objects that are formatted in accordance with the database schema 202 named "Shirt Fabric 1”. The same is true for the other data structures 200 not shown in detail in that the data structures 200 each include data objects structured in accordance with a different one of the database schemas 202. As mentioned above, each of these database schemas 202 is defined using JSON as the data description language.
  • Figure 2 illustrates that the data 1 12 is organized as the data structures 204 formatted in accordance with database schemas 206 defined in XML.
  • Each of the data structures 204 is the portion of the data 1 12 formatted in accordance with a different one of the database schemas 206.
  • Figure 3B the details of one of the data structures 204 and one of the database schemas 206 are shown in Figure 3B.
  • Figure 3B illustrates a detailed example of one of the database structures 204 and one of the database schemas 206.
  • the database schema 206 provides an XML database model for pieces of fabric that are utilized to construct a shirt.
  • Database schemas defined in XML may describe a data record archetype by defining the contents of a record (e.g., a database document).
  • the database schema 206 that is shown in detail in Figure 3B is a data record archetype, named "Shirt Fabrication.”
  • the database schema 206 for "Shirt Fabrication” defines a property named "design number” to identify the design of a shirt and a property named “fabric type” to identify the type of fabric.
  • the database schema 206 defines properties, "piece 1” (which describes a fabric piece for the left front panel of the shirt), “piece 2” (which describes a fabric piece for the left sleeve of the shirt), “piece 3” (which describes a fabric piece for the left cufflink of the shirt), “piece 4” (which describes a fabric piece for the right front panel of the shirt), “piece 5” (which describes a fabric piece for the right sleeve of the shirt), “piece 6” (which describes a fabric piece for the right cufflink of the shirt), “piece 7” (which describes a fabric piece for the back panel of the shirt), “piece 8” (which describes a fabric piece for the collar of the shirt).
  • Each of the properties is also associated with a value for a "bar code ID” and an array of integers describing measurements (named “measurement”) in cm associated with each of the properties.
  • the data structure 204 shown in detail in Figure 3B includes each of the data records that are formatted in accordance with the database schema 206 named "Shirt Fabrication.” The same is true for the other data structures 204 not shown in detail in that the data structures 204 each include data records structured in accordance with a different one of the database schemas 206. As mentioned above, each of these database schemas 206 is defined using XML as the data description language.
  • the properties "left front panel,” “left sleeve,” “left cufflink,” “right front panel,” “right sleeve,” “right cufflink,” “back panel,” “collar,” of the data objects in data structure 200 are equivalent to the properties “piece 1 ,” “piece 2,” “piece 3,” “piece 4,” “piece 5,” “piece 6,” “piece 7,” and “piece 8” of the data records in the data structure 204, respectively.
  • the property string named "design” in the data objects of the data structure 200 represents a string with both blueprint and fabric information while the property “design number” of the data structure 204 includes the blueprint information and the "fabric type” property describes the fabric being used.
  • any kind of equivalency relationship between data substructures in the data structures 200 and the data structures 204 may be canonicalized by the NVTF1 134. More complex relationships may exist such that one or more properties in one or more data structures 200 formatted in accordance to one or more database schemas 202 may be equivalent to one or more properties in multiple data structures 204 formatted in accordance to one or more database schemas 206. For example, groups of properties or data objects in multiple data structures 200 may be equivalent to groups of properties or groups of data records in multiple data structures 204. These and other equivalency relationships may exist between the database schemas 202 and the database schemas 206 as expressed through the data structures 200, 204.
  • the server computer 106 is configured to implement the machine learning network 132 to identify equivalent data substructures in the data structures 200, 204 defined by the database schemas 202, 206. Furthermore, the server computer 106 is configured to construct the NVTF1 134 that includes data types corresponding to one or more of the equivalent data substructures identified by the machine learning network 132.
  • the machine learning network 132 may be any suitable type of learning network, including, for example, one or more Bayesian network(s), neural network(s), heuristic model(s), stochastic model(s), decision tree learning model(s), genetic programming model(s), support vector machine(s), reinforcement learning model(s), regression model(s), Gaussian distribution model(s), and/or the like.
  • Bayesian network(s) including, for example, one or more Bayesian network(s), neural network(s), heuristic model(s), stochastic model(s), decision tree learning model(s), genetic programming model(s), support vector machine(s), reinforcement learning model(s), regression model(s), Gaussian distribution model(s), and/or the like.
  • the server computer 106 may initially be configured to train the machine learning network 132 with test data 210 and target results data 212 prior to implementing the machine learning network 132 on the data structures 200, 204.
  • the test data 210 may include data formatted in accordance to database schemas defined by different data description languages.
  • the test data 210 may include data formatted in accordance with database schemas defined in JSON and in XML.
  • the target results data 212 will identify equivalent substructures in the test data 210 between the database schemas defined in JSON and in XML.
  • the target results data 212 will have been previously worked out as correctly identifying equivalent structures.
  • the machine learning network 132 may propose equivalency relations between the structures in the test data 210 and adjust these equivalency relations until the equivalency relations result in the target results data 212. Once trained, the machine learning network 132 can be implemented on the data structures 200, 204. It should, however, be noted that, in some embodiments, the machine learning network 132 may not require training and may be capable of identifying equivalent substructures between the data structures 200, 204 dynamically and in real-time. This, of course, may depend on the processing power of the server computer 106 and the capabilities of the particular machine learning network 132 being implemented.
  • the machine learning network 132 can thereby be used to determine equivalency relationships between data substructures of the data structure 200, 204. More specifically, the server computer 106 is configured to implement the machine learning network 132 to identify equivalent data substructures in the data structures 200, 204 defined by the database schemas 202, 206. In some embodiments, the machine learning network 132 identifies the equivalent data substructures in the data structures 200, 204 in a completely automated manner that requires no human input. However, in other embodiments, the server computer 106 may implement the machine learning network 132 to identify data substructures in the data structures 200,204 that the machine learning network 132 predicts are equivalent. The server computer 106 may then be configured to receive user input from the input/output devices 124 indicating that these data substructures are one or more of the equivalent data substructures.
  • the machine learning network 132 would identify the properties "left front panel,” “left sleeve,” “left cufflink,” “right front panel,” “right sleeve,” “right cufflink,” “back panel,” and “collar,” of the data objects in data structure 200 (illustrated in detail in Figure 3A) as being equivalent to the properties "piece 1 ,” “piece 2,” “piece 3,” “piece 4,” “piece 5,” “piece 6,” “piece 7,” “piece 8” of the data records in the data structure 204 (illustrated in detail in Figure 2).
  • the machine learning network 132 would identify the property named "design” in the data objects of the data structure 200 as equivalent to the properties named "design number” and "fabric type” in the data structure 204. Accordingly, the machine learning algorithm 132 would identify the data objects defined by the database schema 202 (illustrated in detail in Figure 3A) as equivalent to the database records defined by the database schema 206 (illustrated in detail in Figure 3B).
  • the server computer 106 is configured to construct the NVTH 134 that includes data types 214 corresponding to one or more of the equivalent data substructures identified by the machine learning network 132. These data types 214 may each include data subtypes that correspond to not only the equivalent substructures, but also the data description language, the database schemas 202, 206 of the data structures 200, 204, and schema transformation rules to and from the equivalent data substructures and the NVTH 134. In some embodiments, the server computer 106 is configured to create the NVTH 134 in an entirely automated manner.
  • the server computer 106 is configured to present user output through the input/output devices 124 to a user, wherein the user output describes the equivalent data substructures defined by the database schemas 202, 206.
  • the server computer 106 may then receive user input from the user through the input/output devices 124, wherein the user input semantically describes the data types 214 corresponding to the one or more equivalent data substructures in the data structures 200, 204.
  • the NVTH 134 may be constructed to include the data type 214 visually described in detail in Figure 2.
  • the data type 214 may be named “Shirt Fabrication Fabric” and correspond to the data objects and data records in the data structures 200, 204 that correspond to the "Shirt Fabric 1” database schema 202 and the "Shirt Fabrication” database schema 206.
  • the data type 214 includes a data subtype, named "data description language” for the data description language, a data subtype, named "SN” for the database schema name, and a data subtype, named "DTR” for data transformation rules.
  • the data type 214 also includes a data subtype, named "L front panel” that corresponds to the equivalent properties "left front panel” and "piece 1” in the specifically illustrated data structures 200, 204, respectively. Furthermore, the data type 214 includes a data subtype, named “L sleeve” that corresponds to the equivalent properties "left sleeve” and “piece 2” in the specifically illustrated data structures 200, 204, respectively. The data type 214 includes a data subtype, named "L cufflink” that corresponds to the equivalent properties "left cufflink” and "piece 3” in the specifically illustrated data structures 200, 204, respectively.
  • the data type 214 also includes a data subtype, named "R front panel” that corresponds to the equivalent properties "right front panel” and “piece 4” in the specifically illustrated data structures 200, 204, respectively. Furthermore, the data type 214 includes a data subtype, named “R sleeve” that corresponds to the equivalent properties "right sleeve” and “piece 5” in the specifically illustrated data structures 200, 204, respectively. The data type 214 includes a data subtype, named “R cufflink” that corresponds to the equivalent properties "right cufflink” and "piece 6” in the specifically illustrated data structures 200, 204, respectively.
  • the data type 214 includes a data subtype, named "B panel” that corresponds to the equivalent properties "back panel” and “piece 7” in the specifically illustrated data structures 200, 204, respectively. Also, the data type 214 includes a data subtype, named “shirt collar” that corresponds to the equivalent properties “collar” and “piece 8” in the specifically illustrated data structures 200, 204, respectively. Finally, the data type 214 includes a data subtype, named “Design Spec” that corresponds to the equivalent properties “Design” and both "design number” and “fabric type” in the specifically illustrated data structures 200, 204, respectively.
  • any data object formatted in accordance with the data structure 200 defined by the JSON database schema 204 named "Shirt Fabric 1” can be mapped to the data type 214 of the NVTFI 134.
  • the data subtype for "data description language” would be JSON
  • the data subtype for "SN” would be “Shirt Fabric 1 ”
  • the data subtype “DTR” would be schema transformation rules to and from the JSON database schema 204 to the data type 214 of the NVTF1 134.
  • the "left front panel,” “left sleeve,” “left cufflink,” “right front panel,” “right sleeve,” “right cufflink,” “back panel,” and “collar,” of the data objects in data structure 200 would be mapped to the data subtypes “L front panel,” “L sleeve,” “L cufflink,” “R front panel,” “R sleeve,” “R cufflink,” “B panel,” and “shirt collar,” respectively, of the data type 214.
  • the property named “design” would be mapped by a function to the data subtype “Design Spec.”
  • any data record formatted in accordance with the data structure 204 defined by the XML database schema 206 named "Shirt Fabrication” can be mapped to the data type 214 of the NVTFI 134.
  • the data subtype for "data description language” would be XML
  • the data subtype for "SN” would be “Shirt Fabrication”
  • the data subtype “DTR” would be schema transformation rules to and from the XML database schema 206 to the data type 214 of the NVTF1 134.
  • the "piece 1 ,” “piece 2,” “piece 3,” piece 4,” “piece 5,” “piece 6,” “piece 7,” and “piece 8,” of the data records in data structure 204 would be mapped to the data subtypes “L front panel,” “L sleeve,” “L cufflink,” “R front panel,” “R sleeve,” “R cufflink,” “B panel,” and “shirt collar,” respectively, of the data type 214.
  • the properties named “design number” and “fabric type” would be mapped by a function to the data subtype “Design Spec.”
  • the NVTH 214 can therefore be used to encapsulate the data 1 12 in the JSON database 1 10 and the data 1 12 in the XML database 1 10.
  • requests from both the Java client application program 104 and the XML client application program 104 can be handled for both the data 1 12 in the JSON database 1 10 and the data 1 12 in the XML database 1 10 without regard to the particular syntax and scheme defined by database schemas 202, 204.
  • data is "encapsulated” when the implementation details of a data structure, data substructure, and/or data item are hidden and/or protected from outside access.
  • Figure 4 visually represents exemplary data used to perform transformations with the NVTH 134.
  • the JSON database 1 10 stores the data structure 300 formatted in accordance with the JSON database schema 202 entitled “Shirt Fabric 1 ,” which was described in detail above.
  • the data structure 300 includes data objects 302, 304, which each have values for each of the properties assigned by the JSON database schema 202 entitled "Shirt Fabric 1”
  • These data objects 302, 304 are therefore data substructures that are formatted in accordance with the JSON database schema 202 (entitled "Shirt Fabric 1”) in the data structure 300.
  • each of the data objects 302, 304 is a data output from manufacturing equipment 306.
  • the manufacturing equipment 306 includes the client computer 102 that implements the Java client application program 104, which generates each of the data objects 302, 304 to model materials and/or devices output from the manufacturing equipment 306.
  • the manufacturing equipment 306 may be an industrial textile cutting machine, which takes a large piece of fabric and cuts it into shapes, which can be sewn together to make a particular garment.
  • the properties of the data object 302 thereby model the particular fabric pieces cut out of a fabric for a particular shirt design and include bar code IDs for each of the bar codes provided with each piece of fabric.
  • the data object 304 is structured in the same manner but models pieces output from the manufacturing equipment 306 of fabric for a different shirt design and during a different manufacturing cycle.
  • the client computer 102 that implements the Java client application program 104 generates the data objects 302, 304 after their respective manufacturing cycles and sends them to the server computer 106 for storage in the JSON database 1 10.
  • the fabric pieces cut by the manufacturing equipment 306 are input materials for the manufacturing equipment 308, which in this example is an automated industrial sewing machine.
  • the manufacturing equipment 308 includes the client computer 102 that implements the XML client application program 104.
  • the XML client application program 104 requires that the data input modeling the fabric pieces be formatted in accordance with the of XML database schema 206 entitled "Shirt Fabrication.”
  • the XML client application program 104 of the client computer 102 in the manufacturing equipment 308 would not be able communicate with the JSON database 1 10.
  • the XML client application program 104 would not be able to use the information in the data objects 302, 304 without the AP1 136 and NVTH 134 since these data objects 302, 304 are formatted in accordance with the JSON database schema 202 (which is incompatible with the XML client application program 104).
  • a human would have to enter user input including blueprint identifiers, fabric identifiers, and bar codes in order for the manufacturing equipment 308 since the data output from the Java client application program 104 is incompatible with the XML client application program 104 and the XML database 1 10, which use XML database schemas 206, such as the XML database schema 206 entitled "Shirt Fabrication.”
  • the API 136 of this embodiment uses the NVTH 134 to provide database schema transformations.
  • the transfer of the fabric pieces from the manufacturing equipment 306 and the information from the data objects 302, 304 can be entirely automated despite the differences in database schemas 202, 206.
  • XML based data transformations may be generated as extensible stylesheet language transformations (XSLT).
  • the server computer 106 is configured to receive a data substructure formatted in accordance with one of the database schemas 202 and transform a data item based on the data substructure into an equivalent data substructure formatted in accordance with one of the database schemas 206.
  • a "data item” may be any set of data formatted in accordance with some kind of defined data archetype.
  • a data item may be a data object, a data record, a data file, a data table, a defined combination of one or more of the data components listed herein, and/or the like.
  • the server computer 106 is configured to receive a data substructure formatted in accordance with one of the database schemas 206 and transform a data item based on the data substructure into an equivalent data substructure formatted in accordance with one of the database schemas 202. This is because the data structures formatted in accordance with the database schemas 202 and the data substructures formatted in accordance with the database schemas 206 correspond with data types of the NVTH 134.
  • the data items transformed are the data substructures themselves while in other examples the data item may be the output of a function implemented by the API 136, as explained in further detail below.
  • the AP1 136 of the server computer 106 may transform the data object 302 into the data record 310 formatted in accordance with the database schema 206 entitled "Shirt Fabrication.”
  • the API 136 of the server computer 106 may encapsulate the data object 302 in accordance with the data type entitled "Shirt Fabrication Fabric” (shown in Figure 2) of the NVTH 134.
  • the AP1 136 encapsulates the data object 302 by recognizing that the data object 302 corresponds to the data type 214 entitled "Shirt Fabrication Fabric” of the NVTH 134 and then mapping the data object 302 to a buffer data item 312 formatted in accordance with the NVTH 134.
  • the data subtype for "data description language” of the buffer data item 312 would be JSON
  • the data subtype for "SN” of the buffer data item would be "Shirt Fabric 1 ”
  • the data subtype “DTR” would be schema transformation rules to and from the JSON database schema 204 and the data type 214 of the NVTH 134.
  • the data object 302 is thereby encapsulated as the buffer data item 312 without regard to the syntax and specifics of JSON.
  • the AP1 136 of the server computer 106 may then transform the buffer data item 312 into the data record 310 formatted in accordance with the database schema 206 entitled "Shirt Fabrication” with the schema transformation rules. For example, the AP1 136 may map the buffer data item 312 to the data record 310 since the data record 310 corresponds to the data type 214.
  • the measurement and the bar code ID values of the data subtypes "L front panel,” “L sleeve,” “L cufflink,” “R front panel,” “R sleeve,” “R cufflink,” “B panel,” and “shirt collar,” are mapped to the "piece 1 ,” “piece 2,” “piece 3,” piece 4,” “piece 5,” “piece 6,” “piece 7,” and “piece 8,” of the data record 310, respectively.
  • the values of the data subtype, named “Design Spec” are mapped to the equivalent properties “Design” and both "design number” and “fabric type” of the data record 310.
  • the data object 304 is transformed into the data record 314 in the same manner.
  • the server computer 106 may store the database records 310, 314 in a data structure 316 formatted in accordance with the database schema 206 entitled "Shirt Fabrication.” Accordingly, the XML client application program 104 implemented by the client computer 102 in the manufacturing equipment 308 may make a read request to the XML database 1 10 and obtain the database record 314 and/or the database record 316 that model the fabric pieces output from the manufacturing equipment.
  • the data records 310, 314 are not stored in the XML database 1 10. Rather, the XML client application program 104 makes requests to the JSON database 110 for the data objects 302, 304. As a response to the requests, the AP1 136 of the server computer 106 performs the above-described transformation to provide the database records 310, 314 to the XML client application program 104.
  • the data objects 302, 304 are not stored in the JSON database 1 10. Instead, the Java client application program 104 implemented by the client computer 102 in the manufacturing equipment 306, 308 makes a write request to the XML database 1 10. The data objects 302, 304 would then be transformed by the API 136 into the data records 310, 314, as described above, and then the data records 310, 314 would be stored in the XML database 1 10.
  • the API 136 of the server computer 106 may also perform data functions for the Java and/or the XML client application programs 104.
  • the data functions may receive a data input from the JSON database 1 10 and provide a data output for the XML client application program 104. Additionally, the data functions may receive a data input from the XML database 110 and provide a data output for the Java client application program 104.
  • the AP1 136 of the server computer 106 is configured to implement data functions having a data substructure from the JSON database 1 10 as an input so as to generate a data item formatted in accordance to one of the database schemas 202.
  • the API 136 of the server computer 106 is then configured to transform the data item into a data substructure formatted in accordance with one of the XML database schemas 206.
  • the AP1 136 of the server computer 106 is configured to implement a data function having one of the data objects 302, 304 as input.
  • the data function generates a data item 322 having the "left front panel,” “left sleeve,” “left cufflink,” “right front panel,” “right sleeve,” “right cufflink,” “back panel,” and “collar” properties of the particular input data object 302, 304 in the order that the fabric pieces corresponding to these properties are to be received by the manufacturing equipment 306, 308.
  • the AP1 136 of the server computer 106 will then encapsulate the data item 322 in accordance with the data subtypes "L front panel,” “L sleeve,” “L cufflink,” “R front panel,” “R sleeve,” “R cufflink,” “B panel,” and “shirt collar,” in the data type 214 of the NVTH 214.
  • the AP1 136 may then transform the encapsulated data item 322 into an equivalent data record 324 formatted in accordance with the "piece 1 ,” “piece 2,” “piece 3,” “piece 4,” “piece 5,” “piece 6,” “piece 7,” “piece 8” of the database schema 206 entitled “Shirt Fabrication.”
  • the equivalent data record 324 may then be provided to the XML client application program 104.
  • the AP1 136 of the server computer 106 is configured to implement data functions having a data substructure from the XML database 1 10 as an input so as to generate a data item formatted in accordance to one of the database schemas 206.
  • the API 136 of the server computer 106 is then configured to transform the data item into a data substructure formatted in accordance with one of the JSON database schemas 202.
  • the API 136 of the server computer 106 is configured to implement a data function having a design number and fabric type from the data record 310 as inputs.
  • the data function generates a data item 326 that has matching design numbers and fabric types.
  • the API 136 transforms the data item 326 into a data object 328 formatted in accordance with the database schema 202 entitled "Shirt Fabric 1” (See Figure 3A) using the data type 214 of the NVTF1 134, as described above.
  • the data object 328 may then be provided to the Java client application program 110
  • Figure 5 illustrates exemplary procedures of a machine learning method that may be implemented by a computer system (e.g., the computer system 100 shown in Figure 1 ). Different embodiments of these exemplary procedures may be implemented depending on the particular implementation details of the computer system. Furthermore, the order in which the procedures are presented is not intended to imply a required sequence for the procedures. Rather, the procedures may be implemented in a different sequence and/or some or all of the procedures may be implemented simultaneously.
  • the computer system 100 may provide one or more databases that store data structures formatted in accordance with database schemas (procedure 400).
  • Each of the database(s) includes at least one of the data structures formatted in accordance with at least one of the database schemas.
  • a data structure may be any type of data organized in accordance with a database schema.
  • a data substructure may be any subset of the data in one or more data structures.
  • a data structure may be a collection of data objects or data records formatted in accordance with a database schema.
  • a data structure may also be data objects or data records along with lists, tables, metadata, and other associated digital information related to the data objects and data records.
  • a data substructure may be some subset of the data object or data records, some subset of properties, fields, or attributes in one or more of the data object or data records, along with any associated digital information if any.
  • the computer system 100 may implement a machine learning network on the computer system 100 to identify a plurality of equivalent data substructures in the data structures defined by the database schemas (procedure 402).
  • the database schemas are defined in different data description languages.
  • implementing the machine learning network on the computer system 100 includes identifying equivalent data substructures in the data structures formatted in accordance with database schemas defined in the different data description languages.
  • the database schemas may be defined by the same data description language.
  • the machine learning network predicts the equivalency between data substructures and a user confirms or rejects the equivalency.
  • the computer system 100 may implement the machine learning network to identify data substructures in the data structures that the machine learning network predicts are equivalent and receive user input indicating that one or more of the identified data substructures are one or more of the equivalent data substructures.
  • the machine learning network identifies the equivalent substructures in an entirely automated manner.
  • the machine learning network may be any suitable type of machine learning network, as described above.
  • the machine learning network is implemented entirely as a software program.
  • the machine learning network may be implemented entirely by specially designed hardware.
  • the machine learning network is implemented in some combination of software and specially designed hardware.
  • the computer system 100 may construct an NVTH that includes data types corresponding to one or more of the plurality of equivalent data substructures identified by the machine learning network (procedure 404). In some implementations, determining the relationships between the database schemas is entirely automated but the naming of the data types of the NVTH is only partially automated. Thus, constructing the NVTH includes presenting user output to a user on the computer system 100 where the user output describes the equivalent data substructures defined by the database schemas. Furthermore, the computer system 100 may receive user input from the user on the computer system 100 where the user input semantically describes the data types corresponding to the one or more equivalent data substructures. In other implementations, the naming is entirely automated.
  • the computer system 100 may initially be configured to train the machine learning network with test data and target results data (procedure 406). Thus, prior to procedure 402 (and/or procedure 400), the computer system 100 may perform procedure 406. Once trained, the machine learning network can be implemented to identify equivalent substructures. It should, however, be noted that, in some embodiments, the machine learning network does not require training and may be capable of identifying equivalent substructures dynamically and in real-time. This, of course, may depend on the processing power of the computer system 100 and the capabilities of the particular machine learning network being implemented.
  • Figure 6 illustrates exemplary procedures of a schema transformation method that may be implemented by a computer system 100 (e.g., the computer system 100 100 shown in Figure 1 ). Different embodiments of these exemplary procedures may be implemented depending on the particular implementation details of the computer system 100. Furthermore, the order in which the procedures are presented is not intended to imply a required sequence for the procedures. Rather, the procedures may be implemented in a different sequence and/or some or all of the procedures may be implemented simultaneously.
  • the computer system 100 may receive a first data substructure formatted in accordance with a first database schema (procedure 500).
  • the data substructure may be received from a database or may be received from a client application program, wherein the first data substructure corresponds with a first data type of the NVTH.
  • the computer system 100 may then transform a data item based on the first data substructure into an equivalent second data substructure formatted in accordance with a second database schema that is different from the first database schema, wherein the second data substructure corresponds with the first data type of the NVTH (procedure 502).
  • the first database schema is defined in a first data description language and the second database schema is defined in a second data description language that is different from the first data description language.
  • the first database schema and the second database schema may be defined in the same data description language.
  • the data item is the first data substructure itself.
  • the computer system 100 transforms the first data substructure into the second data substructure at procedure 502.
  • the computer system 100 may encapsulate the first data substructure in accordance with the first data type of the NVTH and transform the encapsulated first data substructure into the second data substructure formatted in accordance with the second database schema.
  • the computer system 100 may encapsulate the first data substructure by recognizing that the first data substructure corresponds to the first data type of the NVTH and map the first data substructure to a buffer data item formatted in accordance with the first data type such that the buffer data item is the encapsulated first data substructure.
  • the data item may be the output of a data function rather than the first data substructure itself.
  • the computer system 100 may implement a function having the first data substructure as an input so as to generate the data item such that the data item is formatted in accordance with the first database schema (procedure 504).
  • the computer system 100 may then perform procedure 502 by encapsulating the data item in accordance with the first data type and transforming the encapsulated data item into the equivalent second data structure.
  • the first data substructure stored by the first database is a data output from first manufacturing equipment and the equivalent second data substructure is a data input to a second database for second manufacturing equipment.
  • the specific examples discussed above with respect to Figure 4 were related to an industrial fabric cutter and an automated industrial sewing machine, the above-described systems and methods are applicable whenever different manufacturing equipment needs to work together despite receiving and/or transmitting data inputs or data outputs formatted in accordance with incompatible database schemas. More broadly, the systems and methods described herein increase the efficiency and interoperability of computer systems with data stored in accordance with incompatible database schemas. This thereby enhances the operation of the computer system and allows heterogeneous database programs to use data regardless of the database schema.
  • Figure 7 illustrates the JSON database 1 10 that stores data 1 12 involved in the construction of an NVTFI 700 (See Figure 1 ) and the XML database 1 10 that stores data 1 12 involved in the construction of an NVTFI 702 (See Figure 1 ).
  • the NVTFI 700 and the NVTFI 702 shown in Figure 1 are both constructed by the server computer 106 using the machine learning network 132 shown in Figure 2.
  • the NVTFI 700 is constructed using the JSON database 1 10 while the NVTFI 702 is constructed using the XML database 110.
  • the same principles and techniques described herein are applicable to construct examples of the NVTFI 700 and the NVTFI 702 that canonicalizes the data 1 12 of any combination of the databases 1 10 (e.g., the SQL database 110 and/or the Codasyl database 110 in Figure 1 ).
  • the exemplary data description languages in the example described above in Figure 1 are JSON, XML, SQL, and Codasyl
  • the described techniques are equally applicable to data 1 12 formatted in accordance to database schemas defined in accordance to other types of data description languages such as Python, PHP, Ruby, Perl, just to name a few.
  • the data 112 does not necessarily have to be on separate databases 1 10.
  • databases 1 10 are capable of storing data 1 12 formatted using different data description languages in separate domains.
  • the described techniques can be utilized whenever subsets of the data 112 are not mutually exclusive but are schematically incompatible.
  • the database schemas for the data 1 12 in the JSON database 110 and for the data 1 12 in the XML database 110 are not provided to the computer system 100.
  • the data 1 12 stored in the JSON database 1 10 is organized as data structures 704 wherein each of these data structures 704 is formatted in JSON and is a portion of the data 1 12 in the JSON database 1 10.
  • the data structures 704 provide JSON data for pieces of automobiles and sections and parts of automobiles from a car manufacturer, which we will call ACME.
  • one of the data structures may include information regarding certain car models manufactured by ACME, design information regarding the motors of car models manufactured by ACME, and model information for car parts of cars manufactured by ACME.
  • the data 1 12 stored in the XML database 1 10 is organized as data structures 706 wherein each of these data structures 706 is formatted in XML and is a portion of the data 1 12 in the XML database 1 10.
  • the data structures 706 provide XML data for pieces of automobiles and sections and parts of automobiles from another car manufacturer, which we will call CARX.
  • one of the data structures may include information regarding certain car models manufactured by CARX, design information regarding the motors of car models manufactured by CARX, and model information for car parts of cars manufactured by CARX.
  • Some of the data structures 704 in the JSON database 110 may be equivalent to some of the data structures 706 in the XML database 1 10.
  • ACME and CARX may use the same seats, windshields, engine parts, tires, etc. for some of their car models.
  • the database schemas for the data structures 704, 706 are unknown and therefore there is no way of being able to know what data is the same or otherwise equivalent when the schema for the data 1 12 is unknown.
  • database schemas must first be constructed for both the data structures 704 and the data structures 706 despite both being in different data description languages. These database schemas provide a canonicalization of the format of the data structures 704, 706 so that the format of the data structures 704, 706 can be compared.
  • the server computer 106 is configured to implement the machine learning network 132 so that the machine learning network 132 constructs the NVTH 700 from the data structures 704 in the JSON database 1 10.
  • the machine learning network 132 may be trained and may seek user input (as described above) in order to construct the NVTH 700.
  • the NVTH 700 thus provides a learned database schema for the data structures 704 based on the information analyzed by the machine learning network 132 regarding these data structures 704.
  • the NVTH 700 thus canonicalizes the data structures 704.
  • Figure 8A illustrates a detailed example of the NVTH 700, this time presuming that the NVTH 700 is canonicalizing the data structures 704 regarding the data 1 12 from ACME.
  • the NVTH 700 that is shown in detail in Figure 8A is a data object archetype, named "ACME AUTOMOBILE.”
  • the NVTH 700 is highly simplified.
  • the NVTH 700 is of the JSON type and includes data types such as a string named "car design” to identify the model of a car and six different properties named, "engine type,” “front seat type,” “back seat type,” “trunk design,” “car door combination,” and "radiator type.” Each of these properties is associated with an integer value that identifies the particular type of car part.
  • the data structures 704 may have included certain instances of only data that corresponds with the "engine type.” Other instances of the data structures 704 may have included data that corresponds to only the "front seat type,” “back seat type,” “trunk design,” “car door combination,” or “radiator type,” respectively. Still other instances of the data structures 704 may have been included data that corresponds to the "car design” along with data that corresponds to all of the "engine type,” “front seat type,” “back seat type,” “trunk design,” “car door combination,” and “radiator type.”
  • the machine learning network 132 thus analyzes the data structures 704 to provide a canonicalization of the data and produce a format for the data structures 704, which is provided as the NVTH 700.
  • the data structures 704 are themselves formatted in JSON, the NVTH 700 may be provided in a common data description language, such as Java.
  • the server computer 106 is also configured to implement the machine learning network 132 so that the machine learning network 132 constructs the NVTH 702 from the data structures 706 in the XML database 1 10.
  • the machine learning network 132 may be trained and may seek user input (as described above) in order to construct the NVTH 702.
  • the NVTH 702 thus provides a learned database schema for the data structures 706 based on the information analyzed by the machine learning network 132 regarding these data structures 706.
  • the NVTH 702 thus canonicalizes the data structures 706.
  • Figure 8B illustrates a detailed example of the NVTH 702, this time presuming that the NVTH 702 is canonicalizing the data structures 706 regarding the data 1 12 from CARX.
  • the NVTH 702 that is shown in detail in Figure 8B is a data object archetype, named "CARX AUTOMOBILE.”
  • CARX AUTOMOBILE a data object archetype
  • the NVTFI 702 is highly simplified.
  • the NVTFI 702 is of the XML type and includes data types such as an integer named “car type” to identify the model of a car and six different properties named, "engine type,” “seat combination type,” “trunk type,” “front car doors,” “back car doors,” and “radiator type.” Each of these properties is associated with an integer value that identifies the particular type of car part. These names and data types may be learned by the machine learning network 132 to identify a workable format for the data structures 706. Note that “engine type,” “seat combination type,” “trunk type,” “front car doors,” “back car doors,” and “radiator type,” are subschemas of the schema provided by the NVTFI 702.
  • data structures 706 might have included certain instances of only data that corresponds with the "engine type.” Other instances of the data structures 706 may have included data that corresponds to only the "seat combination type,” “trunk type,” “front car doors,” “back car doors,” or “radiator type,” respectively.
  • Still other instances of the data structures 706 may have been included data that corresponds to the "car type” along with data that corresponds to all of the "engine type,” “seat combination type,” “trunk type,” “front car doors,” “back car doors,” and “radiator type.”
  • the machine learning network 132 thus analyzes the data structures 706 to provide a canonicalization of the data and produce a workable format for the data structures 706, which is provided as the NVTFI 702.
  • the data structures 706 are themselves formatted in XML
  • the NVTFI 700 may be provided in a common data description language, such as Java.
  • the server computer 106 can analyze the NVTFI 700 and the NVTFI 702 (along with the data structures 704, 706) to determine any equivalent subschemas between the NVTFI 700 and NVTFI 702. In this manner, the server computer can learn to translate data structures 704 and the data structures 706. More specifically, the server computer 106 is configured to implement the machine learning network 132 to learn to transform between subschemas in the NVTFI 700 and the subschemas in the NVTFI 702 when the subschemas of the NVTFI 700 and the subschemas of the NVTFI 702 are equivalent.
  • the "engine type” of the NVTFI 700 is equivalent to the "engine type” of the NVTFI 702.
  • the combination of the "front seat type” and “back seat type” of the NVTFI 700 is equivalent to the "seat combination type” of the NVTFI 702.
  • the "trunk design” of the NVTFI 700 is equivalent to the “trunk type” of NVTFI 702.
  • the "car door combination” of the NVTFI 700 is equivalent to the combination of the "front car doors” and the “back car doors” of the NVTFI 702.
  • the "radiator type” of the NVTFI 700 is equivalent to the "radiator type” of the NVTFI 702.
  • the machine learning network 132 can identify the equivalent data substructures and the equivalent subschemas in the NVTFI 700 and the NVTFI 702. Furthermore, the machine learning network 132 learns transformation rules for transforming the subschemas of the NVTFI 700 and the NVTFI 702 based on the relationships between the data structures 704 and the data structures 706 and between the NVTFI 700 and the NVTFI 702. In this manner, the server computer 106 is configured to implement the machine learning network 132 that learns to transform between the subschemas in the NVTFI 700 and the subschemas in the NVTFI 702.
  • the server computer 106 may receive one of the data structures 704 from the JSON client computer 102, as a data substructure 704 of the NVTH 700.
  • the subschema that corresponds to the format of the data substructure 704 is equivalent to one of the subschemas in the NVTH 702.
  • the data substructure 704 may be formatted in accordance to the "trunk design” of the NVTH 700, which is equivalent to the "trunk type” of the NVTH 702.
  • the server computer 106 may implement the machine learning network 132 to transform data substructure 704 into a data substructure 706 formatted in accordance with the equivalent subschema of the NVTH 702. For example, assuming that the data substructure 704 is of the "trunk design” of the NVTH 700, the machine learning network 132 would generate a data substructure 706 that is formatted in accordance with the "trunk type” of the NVTH 702, where the data substructure 706 is equivalent to the data substructure 704. The machine learning network 132 would utilize learned transformation rules to provide the transformation. In the aforementioned example, integers for codes of the "trunk design” of the NVTH 700 may be transformed into integers for codes of the "trunk type” of the NVTH 702.
  • the machine learning network 132 implemented by the server computer 106 constructs the NVTHs 700, 702 and learns to provide the transformations so that data substructures can be transformed.
  • the machine learning network 132 implemented by the server computer 106 constructs the NVTHs 700, 702 and learns to provide the transformations so that data substructures can be transformed.
  • other computer devices on other computer systems may construct the NVTHs 700, 702 and learn to provide the transformations.
  • a second machine learning network on another computer system may be implemented so that the second machine learning network constructs the NVTH 700 from the data structures 704 and constructs the NVTH 702 from the data structures 706.
  • the server computer 106 may then receive the NVTHs 700, 702 and the data structures 704, 706 and implement the machine learning network 132 to learn to transform between the subschemas of the NVTHs 700, 702.
  • a second machine learning network on another computer system may be implemented so that the second machine learning network constructs the NVTH 700 from the data structures 704
  • a third machine learning network on still another computer system may be implemented so that the third machine learning network constructs the NVTH 702 from the data structures 706.
  • the server computer 106 may then receive the NVTHs 700, 702 and the data structures 704, 706 and implement the machine learning network 132 to learn to transform between the subschemas of the NVTHs 700, 702.
  • the server computer 106 may implement the machine learning network 132 to construct the NVTHs 700, 702 while another machine learning network on another computer system learns to transform between the subschemas of the NVTHs 700, 702.
  • Figure 9 illustrates exemplary procedures of a machine learning method that may be implemented by a computer system (e.g., the computer system 100 shown in Figure 1 ). Different embodiments of these exemplary procedures may be implemented depending on the particular implementation details of the computer system. Furthermore, the order in which the procedures are presented is not intended to imply a required sequence for the procedures. Rather, the procedures may be implemented in a different sequence and/or some or all of the procedures may be implemented simultaneously. [00107] To implement the machine learning method, the computer system 100 may provide one or more databases that store a first set of data structures and a second set of data structures, (procedure 900).
  • the computer system 100 may implement a machine learning network so that the machine learning network constructs a first NVTH from the first set of data structures (procedure 902). Additionally, the computer system 100 implements the machine learning network so that the machine learning network constructs a second NVTH from the second set of data structures (procedure 904).
  • the first set of data structures and the second set of data structures are defined in different data description languages.
  • the computer system 100 may implement the machine learning network so as to learn to transform between a first set of subschemas in the first NVTH and a second set of subschemas in the second NVTH, wherein the first set of subschemas and the second set of subschemas are equivalent (procedure 906).
  • a second computer system might implement a second machine learning network so that the second machine learning network constructs the first NVTH from the first set of data structures and constructs the second NVTH from the second set of data structures.
  • a second computer system may implement a second machine learning network so that the second machine learning network constructs the first NVTH from the first set of data structures and a third computer system implements a third machine learning network to construct the second NVTH from the second set of data structures.
  • the computer system 100 may receive a first data substructure that is formatted in accordance with a first corresponding one of the first set of subschemas, wherein the first corresponding one of the first set of subschemas is equivalent to a first corresponding one of the second set of subschemas (procedure 908).
  • the computer system 100 may implement the machine learning network on the computer system to transform the first data substructure into a second data substructure formatted in accordance with the first corresponding one of the second set of subschemas (procedure 910).
  • the first data substructure stored by the first database is a data output from a car parts manufacturer, which may be formatted in accordance to the requirements of a particular car company (e.g. ACME as discussed above).
  • the automobile manufacturer that is actually using the car part to make the automobile may require a different database format (e.g., a manufacturer for CARX may use the same part but a different database format) than the car parts manufacturer.
  • the machine learning network thus allows for transformations between the database formats so that the automobile manufacturer can use the part made by the car parts manufacturer in the automobile manufacturer's industrial automobile manufacturing process.
  • the systems and methods described herein thereby increase the efficiency and interoperability of computer systems with data stored in accordance with incompatible database formats.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems and methods are disclosed for performing database schema transformations. In one example, a machine learning network is used to identify a plurality of equivalent data substructures in data structures formatted in accordance with different database schemas. A name value type hierarchy (NVTH) is then constructed that includes data types corresponding to one or more of the plurality of equivalent data substructures. Schema transformations can then be performed using the NVTH. In particular, the NVTH may be used to encapsulate data substructures in one of the database schemas and then transform the encapsulated data into an equivalent substructure in another database schema.

Description

COMPUTER SYSTEMS AND METHODS FOR DATABASE SCHEMA TRANSFORMATIONS
Cross-Reference to Related Applications
[0001] This application cites the priority of US 62/809,599 filed 23 February 2019 and US 62/816,518 filed 1 1 March 2019, which are currently pending, and which are incorporated herein by reference in their entireties.
Field of the Disclosure
[0002] This disclosure relates generally to computer systems that utilize one or more databases and methods of operating the same.
Background
[0003] Databases are often used to store and collect digital data in a manner that can be easily accessed and updated. To do this, the digital data stored in the database has to be organized in a manner that allows a client program to find and write digital data to the database. In many circumstances, this is done with database schemas. More specifically, database schemas provide data archetypes for things (e.g., materials, devices, places, sale items) or operations (e.g., steps in a manufacturing process, directions to a location, shipping of sale items, monitored events) that are being modeled by the database. For example, certain manufacturing equipment may receive physical inputs (i.e, materials or devices received by the manufacturing equipment) and create physical outputs (i.e., materials or devices created by the manufacturing process implemented by the manufacturing equipment) from the physical inputs. Data inputs and data outputs may be used to model information regarding those physical inputs and physical outputs, respectively. These data inputs and data outputs may be stored and retrieved from a database. The database may organize these data inputs and data outputs in accordance with database schemas that define data archetypes for modeling the physical inputs and the physical outputs.
[0004] To automate an assembly line, the data outputs of one piece of manufacturing equipment need to become the data inputs for another piece of manufacturing equipment. Unfortunately, different pieces of manufacturing equipment often include computer systems that use different incompatible software. For instance, the database schemas for one piece of manufacturing equipment may be defined in a different data description language than the database schemas of another piece of manufacturing equipment along the assembly line. Dealing with these incompatible database schemas is often difficult. More specifically, databases and client programs often assume incompatible database schemas thereby preventing client programs from exchanging data with the databases since the client program and/or the databases will not be able to understand request and data organized in accordance with incompatible database schemas and/or using incompatible data description languages.
[0005] Accordingly, what are needed are new techniques that provide simple and universal solutions for translating incompatible database schemas and require less programming time by computer professionals. Summary
[0006] This disclosure relates to systems and methods for translating incompatible database schemas. In one embodiment, a computer system performs a machine learning method. More specifically, one or more databases may be provided, wherein the database(s) store data structures formatted in accordance with database schemas and each of the database(s) includes at least one of the data structures formatted in accordance with at least one of the database schemas. The database schemas can use tables and foreign key connections to describe the structure of data in the database. The computer system implements a machine learning network to identify a plurality of equivalent data substructures in the data structures defined by the database schemas. The database then constructs a name value type hierarchy (NVTH) that includes data types corresponding to one or more of the plurality of equivalent data substructures identified by the machine learning network. Accordingly, the machine learning network can be used to learn equivalent data substructures in incompatible database schemas in an automated fashion.
[0007] The NVTH then be used to provide database schema transformations. Thus, in one embodiment, the computer system performs a schema transformation method. More specifically, the computer system receives a first data substructure formatted in accordance with a first data structure. The first data substructure corresponds with a first data type of the NVTH. The computer system may then transform a data item based on the first data substructure into an equivalent second data substructure formatted in accordance with a second database schema that is different from the first database schema, wherein the second data substructure also corresponds with the first data type of the NVTH. In this manner, the NVTH can be utilized to translate data from one database schema to another incompatible database schema without requiring a specialized and complex computer program. It should be noted that, in some implementations, the data item is the first data substructure itself. In other implementations, the data item may be the output of a function that uses the first data substructure as input.
[0008] Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
Brief Description of the Drawings
[0009] The accompanying drawings incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.
[0010] Figure 1 illustrates one embodiment of a computer system configured to implement a machine learning network that constructs an NVTH that can be utilized to translate data between incompatible database schemas.
[0011] Figure 2 visually represents exemplary data involved in the construction of a data type of the NVTH.
[0012] Figure 3A illustrates a detailed example of one of the database structures and one of the database schemas defined using JSON.
[0013] Figure 3B illustrates a detailed example of one of the database structures and one of the database schemas defined using XML. [0014] Figure 4 visually represents exemplary data used to perform schema transformations with the NVTH.
[0015] Figure 5 illustrates exemplary procedures of a machine learning method.
[0016] Figure 6 illustrates exemplary procedures of a schema transformation method.
[0017] Figure 7 visually represents exemplary data involved in the construction of NVTFIs from different data structures formatted in different data description languages.
[0018] Figure 8A illustrates a detailed example of an NVTFI constructed from JSON data structures.
[0019] Figure 8B illustrates a detailed example of an NVTFI constructed from XML data structures.
[0020] Figure 9 illustrates exemplary procedures of another machine learning method.
Detailed Description
[0021] The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the disclosure and illustrate the best mode of practicing the disclosure. Upon reading the following description in light of the accompanying drawings, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
[0022] The computational systems and methods disclosed herein enable client application programs to access schematically incompatible databases. In this disclosure, a database is a recognized database, in a recognized structure (e.g., tables). The computational systems and methods described herein canonicalize incompatible database schemas to construct a Name Value Type Hierarchy (NVTFI). The NVTFI defines data types assigned to semantically equivalent components of the database schemas. The NVTFI can then be used by an Application Programming Interface (API) to transform data formatted in one database schema into semantically equivalent data formatted in accordance with an incompatible database schema. To do this, the API may encapsulate the data in accordance with a corresponding data type of the NVTFI and then transform the encapsulated data into the semantically equivalent data of the other database schema. Thus, a client application program may send read or write requests for data in accordance with its database schema even though the database schema of the database storing the data is incompatible with the database schema of the client application program.
[0023] With the NVTFI, the API can also perform a function on input data from a client application program and transform the resulting output data into semantically equivalent data formatted in accordance with the incompatible database schemas of the database. The NVTFI can, in addition, allow the API to transform output data from a function performed by the client application program. Finally, the API can also use the NVTFI to transform data formatted in accordance with the database schema of one database and transform the data into semantically equivalent data formatted in accordance with an incompatible database schema of another database.
[0024] The systems and methods are particularly useful when dealing with database schemas defined in different data description languages. More specifically, the data types of the NVTFI may be canonicalizations of equivalent components of database schemas defined in different data description languages. In some embodiments, this allows the API to encapsulate the data regardless of the data description language used to define the database schema. This encapsulation, in effect, allows the API to "observe” the semantics of the data without being impeded by syntax and the specifics of the data description languages. Schema transformation rules may be used by the API to map input and output data in the database schemas to the NVTH and provide schema transformations to and from the database schemas. The systems and methods thus improve the operation of computer systems by allowing databases and client application programs that use heterogeneous and incompatible database schemas to exchange and manipulate data without regard to the idiosyncrasies of the particular database schemas and/or the data description languages.
[0025] The systems and methods described herein may utilize machine learning networks implemented by a computer system to canonicalize the database schemas in order to construct the NVTH, as explained in further detail below. This is a significant advantage to previously known systems. First, the machine learning techniques described herein can determine the relationships for database schema transformations regardless of the amount of things, operations, actions, and services being modeled and the variety of data description languages used to model them. Second, constructing an NVTH that canonicalizes the database schemas provides a generalized solution that does not require a human to work out complex transformations between the database schemas.
[0026] Clearly, the systems and methods have a wide application anywhere in the computer industry where incompatible database schemas are a problem. One particularly important application for the systems and methods described herein relates to computer systems utilized in manufacturing facilities. For instance, manufacturing facilities often have an assembly line with manufacturing equipment created by different manufacturers. Each piece of manufacturing equipment may include a computer system that communicates with a database in order to receive input information regarding the materials or devices received by the equipment and to transmit information regarding the materials or devices output from the manufacturing equipment. In order to automate the functions of the manufacturing equipment, input data is received by the manufacturing equipment regarding the materials and/or devices received by the equipment and output data is transmitted to the database regarding the output materials and/or devices output from the manufacturing equipment. This input data and output data often model different aspects of the input and output materials and/or devices in accordance with a database schema. However, if the other manufacturing equipment in the assembly line is designed by other manufacturers, the other manufacturing equipment in the assembly line may use different database schemas in order to model its input and output materials and/or devices. Thus, the same devices and materials may be modeled in accordance with different and incompatible database schemas. This is particularly problematic when different data description languages are used to create the database schemas and when the client application programs for each piece of manufacturing equipment uses different languages to make queries and requests.
[0027] The systems and methods described herein provide a universal technique for solving these problems. In particular, machine learning networks can be used to identify a plurality of equivalent data substructures defined by different database schemas and create the NVTH, as explained in further detail below. The NVTH can then be used by the API to identify the equivalent data substructures when both pieces of manufacturing equipment are operating. This allows for the various pieces of manufacturing equipment to communicate input and output data regardless of the database schemas used to model the input and output materials and/or devices.
[0028] An NVTH is an extension of a name value hierarchy (NVH), where "type” describes a piece of hierarchy. The self-describing nature of the "T” runs through the hierarchy of the NVTH. For example, an NVH in JSON has types (number, strings, etc.) and arrays and objects (hierarchy). In the NVTH, the "type” of the NVTH provides a hierarchy value for a data substructure along a schema. For example, the data substructure may be a piece of cloth, to polygon, to points on a polygon with regards to garment manufacturing. The NVTH thus allows for data substructures in a data structure to be modularized within an overall data format provided by the NVTH. This allows for the NVTH to provide a database schema without actually knowing anything about the formalized and digitally recorded database schema for a data object or data record. The schema is the "type.” (Number, Type (Polygon, or Zipper). Thus, an NVTH provides a more modular schema.
[0029] The described systems and techniques may be performed by a computer system that includes a single computer or more than one computer. A computer may be a processor-controlled device, such as, by way of example, personal computers, workstations, servers, clients, minicomputers, mainframe computers, laptop computers, smartphones, tablets, a network of one or more individual computers, mobile computers, portable computers, handheld computers, palm-top computers, set-top boxes for a TV, interactive televisions, interactive kiosks, personal digital assistants, interactive wireless devices, or any combination thereof.
[0030] A computer may be a uniprocessor or multiprocessor machine. Accordingly, a computer may include one or more processors and, thus, the aforementioned computer system may also include one or more processors. Examples of processors include sequential state machines, microprocessors, microcontrollers, graphics processing units (GPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems on a chip (SoC), baseband processors, field-programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
[0031] Additionally, the computer may include one or more memories. Accordingly, the aforementioned computer systems may include one or more memories. A memory may include a memory storage device or an addressable storage medium which may include, by way of example, random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), electronically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), hard disks, floppy disks, laser disk players, digital video disks, compact disks, videotapes, audio tapes, magnetic recording tracks, magnetic tunnel junction (MTJ) memory, optical memory storage, quantum mechanical storage, electronic networks, and/or other devices or technologies used to store electronic content such as programs and data.
[0032] In particular, the one or more memories may store computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to implement the procedures and techniques described herein. The one or more processors may be operably associated with the one or more memories so that the computer-executable instructions can be provided to the one or more processors for execution. For example, the one or more processors may be operably associated with the one or more memories through one or more buses. Furthermore, the computer may possess or may be operably associated with input devices (e.g., a keyboard, a keypad, controller, a mouse, a microphone, a touch screen, a sensor) and output devices such as (e.g., a computer screen, printer, or a speaker).
[0033] The computer may execute an appropriate operating system such as LINUX®, UNIX®, MICROSOFT® WI NDOWS®, APPLE® MACOS®, IBM® OS/2®, ANDROID, and PALM® OS, and/or the like. The computer may advantageously be equipped with a network communication device such as a network interface card, a modem, or other network connection device suitable for connecting to one or more networks.
[0034] A computer may advantageously contain control logic, or program logic, or other substrate configuration representing data and instructions, which cause the computer to operate in a specific and predefined manner as described herein. In particular, the computer programs, when executed, enable a control processor to perform and/or cause the performance of features of the present disclosure. The control logic may advantageously be implemented as one or more modules. The modules may advantageously be configured to reside on the computer memory and execute on the one or more processors. The modules include, but are not limited to, software or hardware components that perform certain tasks. Thus, a module may include, by way of example, components, such as software components, processes, functions, subroutines, procedures, attributes, class components, task components, object-oriented software components, segments of program code, drivers, firmware, micro-code, circuitry, data, and/or the like.
[0035] The control logic conventionally includes the manipulation of digital bits by the processor and the maintenance of these bits within memory storage devices resident in one or more of the memory storage devices. Such memory storage devices may impose a physical organization upon the collection of stored data bits, which are generally stored by specific electrical or magnetic storage cells.
[0036] The control logic generally performs a sequence of computer-executed steps. These steps generally require manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, or otherwise manipulated. It is conventional for those skilled in the art to refer to these signals as bits, values, elements, symbols, characters, text, terms, numbers, files, or the like. It should be kept in mind, however, that these and some other terms should be associated with appropriate physical quantities for computer operations, and that these terms are merely conventional labels applied to physical quantities that exist within and during operation of the computer based on designed relationships between these physical quantities and the symbolic values they represent.
[0037] It should be understood that manipulations within the computer are often referred to in terms of adding, comparing, moving, searching, or the like, which are often associated with manual operations performed by a human operator. It is to be understood that no involvement of the human operator may be necessary, or even desirable. The operations described herein are machine operations performed in conjunction with the human operator or user that interacts with the computer or computers.
[0038] It should also be understood that the programs, modules, processes, methods, and the like, described herein are but an exemplary implementation and are not related, or limited, to any particular computer, apparatus, or computer language. Rather, various types of general-purpose computing machines or devices may be used with programs constructed in accordance with some of the teachings described herein. In some embodiments, very specific computing machines, with specific functionality, may be required. Similarly, it may prove advantageous to construct a specialized apparatus to perform the method steps described herein by way of dedicated computer systems with hard-wired logic or programs stored in nonvolatile memory, such as, by way of example, read-only memory (ROM).
[0039] In some embodiments, features of the computer systems can be implemented primarily in hardware using, for example, hardware components such as application-specific integrated circuits (ASICs) or field- programmable gated arrays (FPGAs). Implementation of the hardware circuitry will be apparent to persons skilled in the relevant art(s). In yet another embodiment, features of the computer systems can be implemented using a combination of both general-purpose hardware and software.
[0040] Figure 1 illustrates one embodiment of a computer system 100 configured to operate as described above. As shown in Figure 1 , the computer system 100 includes a plurality of client computers 102. Each of the client computers 102 may be a processor-controlled device, such as, by way of example, personal computers, workstations, laptop computers, smartphones, tablets, individual computers, mobile computers, portable computers, handheld computers, palm-top computers, set-top boxes for a TV, interactive televisions, interactive kiosks, personal digital assistants, interactive wireless devices, and/or the like. Each of the client computers 102 may include one or more processors (not explicitly shown) and at least one memory (not explicitly shown) configured to store computer-executable instructions.
[0041] Among the various computer programs that are comprised of the computer-executable instructions, at least some of the computer-executable instructions and for client application programs 104 that are configured to make database requests for specific data. Flowever, the client application programs 104 among the various client computers 102 may be configured to make database requests for different types of database systems and in particular use different database control languages (DCL). The DCL provide the syntax used to control access to the data stored in the databases. Flowever, when making to request to read or write data, the client application programs 104 need to identify the different data entities that are the subject of the request, which are formatted in accordance to the database schemas presumed to be used by the databases to store the data.
[0042] For example, in this embodiment, one of the client application programs 104 is a Java client application program, another one of the client application programs 104 is an XML client application program, yet another one of the client application programs 104 is an SQL client application program, and finally still another one of the client application programs 104 is a Codasyl client application program. As discussed in further detail below, each of the client application programs 104 are thus configured to make read database request and receive data in response to those read database requests. Each of the client application programs 104 may also be configured to provide write database requests to write data to the databases. Each of these client application programs 104 will make these database requests and send/receive data associated with these database requests in accordance to the particular syntax required by the database management system for which the client application programs 104 are configured. [0043] It should be noted that each of the client computers 102 may be provided as part of different manufacturing equipment within the assembly line of a manufacturing facility. For example, within an industrial manufacturing facility, various different types of manufacturing equipment may be provided along an assembly line in order to manufacture goods. It is common for different manufacturing equipment in the manufacturing facility to come from different companies, which may be designed in accordance with varying equipment specifications. For example, different manufacturing equipment may be designed in accordance with software specifications requiring different types of database software. Accordingly, each of the client application programs 104 may input and output data records in accordance with the requirements of different types of incompatible database programs (e.g., Java, XML, SQL, Codasyl). The computer system 100 provides a solution to the incompatibility.
[0044] Still referring to Figure 1 , the computer system 100 includes a server computer 106, which is a computer subsystem of the computer system 100. The computer system 100 also includes various databases 1 10, which are operably associated with the server computer 106. Each of the databases 1 10 is configured to store different sets of data 1 12. The different sets of data 1 12 have data structures that are formatted in accordance with different database schemas. For example, the set of data 1 12 stored in one of the databases 1 10 has data structures formatted in accordance with JSON database schemas. The set of data 1 12 in another one of the databases 110 has data structures formatted in accordance with XML schemas. The set of data 1 12 in yet another one of the databases 1 10 has data structures formatted in accordance with SQL database schemas. Finally, the set of data 1 12 in still another one of the databases 1 10 has data structures formatted in accordance with Codasyl database schemas. Thus, JSON, XML, SQL, and Codasyl are the data description language used to define the database schemas of each of the databases 1 10.
[0045] It should be noted that a DCL and/or a data description language may or may not be a programming language. For example, JSON is a data description language that is independent of and is not a programming language (JSON sometimes is referred to as a language-independent data format). While JSON was originally intended to be a subset of the JavaScript scripting language, it is a text-based data description language that defines database schemas using attribute-value pairs and array data types. Many programming languages include code that is capable of parsing JSON database schemas, including Java. Flowever, JSON is strictly a data description language used for formatting the database schemas of the database and is not a DCL or more generally, a programming language. On the other hand, SQL is a programming language that can be used as both the DCL and the data description language. These distinctions would be apparent to one of ordinary skill in the art.
[0046] Note that the Java client application program 104 is configured to make database requests and receive data segments of the data 112 in the JSON database 1 10 where the data 1 12 is formatted in accordance with JSON schemas without further assistance. Flowever, the Java client application program 104 is unable to make database request or receive data segments in the data 1 12 formatted in accordance with XML, SQL, and Codasyl database schemas. Furthermore, the XML client application program 104 is configured to make database requests and receive data segments in the data 1 12 from the XML database 1 10 with the set of data 1 12 formatted in accordance with XML database schemas without further assistance. Flowever, the XML client application program 104 is unable to make database request or receive data segments in the data 112 formatted in accordance with JSON, SQL, and Codasyl database schemas. Additionally, the SQL client application program 104 is configured to make database requests and receive data segments in the data 1 12 from the SQL database 1 10 where the data 1 12 is formatted in accordance with SQL database schemas. However, the SQL client application program 104 for SQL is unable to make database request or receive data segments in the data 1 12 formatted in accordance with JSON, XML, and Codasyl database schemas. Finally, the Codasyl client application program 104 is configured to make database requests and receive data segments in the data 1 12 from the Codasyl database 1 10 where the data 1 12 is formatted in accordance with Codasyl database schemas without further assistance. However, the Codasyl client application program 104 is unable to make database requests or receive data segments in the data 1 12 formatted in accordance with JSON, XML, and SQL database schemas.
[0047] The server computer 106 is a computer system that is configured to learn both how the JSON, XML, SQL, and Codasyl database schemas relate to one another and, using these relations, allow each of the client application programs 104 to make requests to the databases 1 10. As explained below, the client application programs 104 can make requests to read or write data to the databases 1 10 regardless of whether the DCL of the client application program 104 is compatible with the data description language of the database schemas of the databases 1 10.
[0048] The server computer 106 includes one or more central processing units (CPUs) 1 14, each including one or more processors 1 16. The CPU(s) 1 14 may be a master device. The CPU(s) 1 14 may also have cache memory 1 18 (which is a type of memory) coupled to the processor(s) 1 16 for rapid access to temporarily stored computer-executable instructions and register values. The CPU(s) 1 14 are coupled to a system bus 120 where the system bus 120 is configured to intercouple master and slave devices included in the server computer 106. The system bus 120 may be a bus interconnect. As is well known, the CPU(s) 1 14 communicate with these other devices by exchanging address, control, computer-executable instructions and other information over the system bus 120. Furthermore, the server computer 106 includes a memory system 122, one or more input/output devices 124, and one or more network interface devices 126.
[0049] The input/output device(s) 124 can include any type of input/output device including keyboards, displays, touchscreens, switches, microphones, speakers, and/or the like. The network interface device(s) 126 can be any device configured to allow data exchange between the server computer 106 and the client computers 102, and between the server computer 106 and the databases 110. For example, the client computers 102, the server computer 106, and the databases 1 10 may all be part of a computer network, including but not limited to a wired or wireless network, private or public network, a local area network (LAN), a wide local area network (WLAN), and/or the Internet. The network interface device(s) 126 can be configured to support any type of communication protocol desired used by different types of computer networks.
[0050] The memory system 122 can include one or more memories 128 that are configured to store computer-executable instructions 130. In this embodiment, the computer-executable instructions 130 may be loaded through the system bus 120 to the cache memory(ies) 1 18 of the CPU(s) 114. In this manner, the processor(s) 1 16 of the CPU(s) 1 14 are configured to execute the computer-executable instructions 130. As explained in further detail below, the computer-executable instructions 130 cause the processor(s) 1 16 to implement a machine learning network 132, which may be used to construct an NVTH 134. The NVTH 134 is a canonicalization of the heterogeneous database schemas used to organize the data 1 12 in the various databases 1 10. The computer-executable instructions 130 also cause the processor(s) 1 16 of the CPU(s) 1 14 to implement an API 136, which uses the NVTH 134 to allow the client application programs 104 to communicate with the databases 1 10 regardless of the data description language used to define the database schemas of the databases 1 10.
[0051] Referring now to Figure 1 and Figure 2, Figure 2 visually represents exemplary data involved in the construction of a data type of the NVTH 134. In this embodiment, the NVTH 134 shown in Figure 2 is constructed by the server computer 106 using the machine learning network 132 shown in Figure 1. In this example, the construction of the NVTH 134 is described with respect to the JSON database 1 10 and the XML database 1 10. This is for the sake of simplicity and clarity. However, it should be noted that the same principles and techniques described herein are applicable to construct an example of the NVTH 134 that canonicalizes the data 1 12 of any combination of the databases 1 10. Furthermore, while the exemplary data description languages in the example described above in Figure 1 are JSON, XML, SQL, and Codasyl, the described techniques are equally applicable to data 1 12 formatted in accordance to database schemas defined in accordance to other types of data description languages such as Python, PHP, Ruby, Perl, just to name a few. Furthermore, the data 112 does not necessarily have to be on separate databases 1 10. For example, some databases 1 10 are capable of storing data 1 12 formatted in accordance with database schemas defined in different data description languages in separate domains. Finally, the described techniques can be utilized whenever there are database schemas that provide models that are not mutually exclusive but that are schematically incompatible. Thus, the techniques described herein are applicable when there are the databases 1 10 that store data 1 12 formatted in accordance with database schemas defined in the same data description language but where the organization used by the database schemas are incompatible. These and other application for the principles and techniques described herein would be apparent to one of ordinary skill in the art in light of this disclosure.
[0052] With regards to the data 1 12 in the JSON database 1 10, Figure 2 illustrates that the data 1 12 is organized as data structures 200 formatted in accordance with database schemas 202 defined in JSON. Each of the data structures 200 is the portion of the data 1 12 formatted in accordance with a different one of the database schemas 202. For the sake of explanation, the details of one of the data structures 200 and one of the database schemas 202 are shown in Figure 3A. In this example, the database schema 202 that is shown in detail provides a JSON database model for pieces of fabric that are utilized to construct a shirt. Database schemas defined in JSON are text based and may describe a data object archetype by relating properties and arrays to strings and values.
[0053] Figure 3A illustrates a detailed example of one of the database structures 200 and one of the database schemas 202. The database schema 202 that is shown in detail in Figure 3A is a data object archetype, named "Shirt Fabric 1 Furthermore, the database schema 202 for "Shirt Fabric 1” defines a string named "design” to identify the design of a shirt and eight different (complex) properties named, "left front panel,” "left sleeve,” "left cufflink,” "right front panel,” "right sleeve,” "right cufflink,” "back panel,” "collar,” and which model the corresponding fabric pieces used to build the different portions of the shirt (as named). Each of these properties is complex and is associated with an integer value for a "bar code ID” (corresponding to a number on an attached bar code) and an array of integers named "measurement” that describe measurements in cm associated with each of the properties. The data structure 200 (i.e., the data structure 1 ) shown in detail in Figure 3A includes each of the data objects that are formatted in accordance with the database schema 202 named "Shirt Fabric 1”. The same is true for the other data structures 200 not shown in detail in that the data structures 200 each include data objects structured in accordance with a different one of the database schemas 202. As mentioned above, each of these database schemas 202 is defined using JSON as the data description language.
[0054] With regards to the data 1 12 in the XML database 1 10, Figure 2 illustrates that the data 1 12 is organized as the data structures 204 formatted in accordance with database schemas 206 defined in XML. Each of the data structures 204 is the portion of the data 1 12 formatted in accordance with a different one of the database schemas 206. For the sake of explanation, the details of one of the data structures 204 and one of the database schemas 206 are shown in Figure 3B.
[0055] Figure 3B illustrates a detailed example of one of the database structures 204 and one of the database schemas 206. In this example, the database schema 206 provides an XML database model for pieces of fabric that are utilized to construct a shirt. Database schemas defined in XML may describe a data record archetype by defining the contents of a record (e.g., a database document). Thus, the database schema 206 that is shown in detail in Figure 3B is a data record archetype, named "Shirt Fabrication.” Furthermore, the database schema 206 for "Shirt Fabrication” defines a property named "design number” to identify the design of a shirt and a property named "fabric type” to identify the type of fabric. Furthermore, the database schema 206 defines properties, "piece 1” (which describes a fabric piece for the left front panel of the shirt), "piece 2” (which describes a fabric piece for the left sleeve of the shirt), "piece 3” (which describes a fabric piece for the left cufflink of the shirt), "piece 4” (which describes a fabric piece for the right front panel of the shirt), "piece 5” (which describes a fabric piece for the right sleeve of the shirt), "piece 6” (which describes a fabric piece for the right cufflink of the shirt), "piece 7” (which describes a fabric piece for the back panel of the shirt), "piece 8” (which describes a fabric piece for the collar of the shirt). Each of the properties is also associated with a value for a "bar code ID” and an array of integers describing measurements (named "measurement”) in cm associated with each of the properties. The data structure 204 shown in detail in Figure 3B includes each of the data records that are formatted in accordance with the database schema 206 named "Shirt Fabrication.” The same is true for the other data structures 204 not shown in detail in that the data structures 204 each include data records structured in accordance with a different one of the database schemas 206. As mentioned above, each of these database schemas 206 is defined using XML as the data description language.
[0056] Note that the properties "left front panel,” "left sleeve,” "left cufflink,” "right front panel,” "right sleeve,” "right cufflink,” "back panel,” "collar,” of the data objects in data structure 200 are equivalent to the properties "piece 1 ,” "piece 2,” "piece 3,” "piece 4,” "piece 5,” "piece 6,” "piece 7,” and "piece 8” of the data records in the data structure 204, respectively. With regards to the database schema 202, the property string named "design” in the data objects of the data structure 200 represents a string with both blueprint and fabric information while the property "design number” of the data structure 204 includes the blueprint information and the "fabric type” property describes the fabric being used. As such, the relationship between "design” of the database schema 202 named "Shirt Fabric 1” and both "design number” and "fabric type” of the database schema 206 named "Shirt Fabrication” can be described by a function. Note therefore that each of these are equivalent data substructures between the data structure 200 illustrated in detail in Figure 3A and the data structure 204 illustrated in detail in Figure 3B. Furthermore, each of the data objects in the data structure 200 illustrated in detail in Figure 3A and each of the data records in the data structure 204 illustrated in detail in Figure 3B are equivalent data substructures.
[0057] With regards to the data structures 200 and the data structures 204 in Figure 2, any kind of equivalency relationship between data substructures in the data structures 200 and the data structures 204 may be canonicalized by the NVTF1 134. More complex relationships may exist such that one or more properties in one or more data structures 200 formatted in accordance to one or more database schemas 202 may be equivalent to one or more properties in multiple data structures 204 formatted in accordance to one or more database schemas 206. For example, groups of properties or data objects in multiple data structures 200 may be equivalent to groups of properties or groups of data records in multiple data structures 204. These and other equivalency relationships may exist between the database schemas 202 and the database schemas 206 as expressed through the data structures 200, 204.
[0058] Referring now to Figure 1 , Figure 2, Figure 3A, and Figure 3B, the server computer 106 is configured to implement the machine learning network 132 to identify equivalent data substructures in the data structures 200, 204 defined by the database schemas 202, 206. Furthermore, the server computer 106 is configured to construct the NVTF1 134 that includes data types corresponding to one or more of the equivalent data substructures identified by the machine learning network 132.
[0059] The machine learning network 132 may be any suitable type of learning network, including, for example, one or more Bayesian network(s), neural network(s), heuristic model(s), stochastic model(s), decision tree learning model(s), genetic programming model(s), support vector machine(s), reinforcement learning model(s), regression model(s), Gaussian distribution model(s), and/or the like.
[0060] In one embodiment, the server computer 106 may initially be configured to train the machine learning network 132 with test data 210 and target results data 212 prior to implementing the machine learning network 132 on the data structures 200, 204. The test data 210 may include data formatted in accordance to database schemas defined by different data description languages. For example, with respect to the example described for Figure 2, the test data 210 may include data formatted in accordance with database schemas defined in JSON and in XML. The target results data 212 will identify equivalent substructures in the test data 210 between the database schemas defined in JSON and in XML. The target results data 212 will have been previously worked out as correctly identifying equivalent structures. The machine learning network 132 may propose equivalency relations between the structures in the test data 210 and adjust these equivalency relations until the equivalency relations result in the target results data 212. Once trained, the machine learning network 132 can be implemented on the data structures 200, 204. It should, however, be noted that, in some embodiments, the machine learning network 132 may not require training and may be capable of identifying equivalent substructures between the data structures 200, 204 dynamically and in real-time. This, of course, may depend on the processing power of the server computer 106 and the capabilities of the particular machine learning network 132 being implemented.
[0061] The machine learning network 132 can thereby be used to determine equivalency relationships between data substructures of the data structure 200, 204. More specifically, the server computer 106 is configured to implement the machine learning network 132 to identify equivalent data substructures in the data structures 200, 204 defined by the database schemas 202, 206. In some embodiments, the machine learning network 132 identifies the equivalent data substructures in the data structures 200, 204 in a completely automated manner that requires no human input. However, in other embodiments, the server computer 106 may implement the machine learning network 132 to identify data substructures in the data structures 200,204 that the machine learning network 132 predicts are equivalent. The server computer 106 may then be configured to receive user input from the input/output devices 124 indicating that these data substructures are one or more of the equivalent data substructures.
[0062] In the specific example described above, the machine learning network 132 would identify the properties "left front panel,” "left sleeve,” "left cufflink,” "right front panel,” "right sleeve,” "right cufflink,” "back panel,” and "collar,” of the data objects in data structure 200 (illustrated in detail in Figure 3A) as being equivalent to the properties "piece 1 ,” "piece 2,” "piece 3,” "piece 4,” "piece 5,” "piece 6,” "piece 7,” "piece 8” of the data records in the data structure 204 (illustrated in detail in Figure 2). Furthermore, the machine learning network 132 would identify the property named "design” in the data objects of the data structure 200 as equivalent to the properties named "design number” and "fabric type” in the data structure 204. Accordingly, the machine learning algorithm 132 would identify the data objects defined by the database schema 202 (illustrated in detail in Figure 3A) as equivalent to the database records defined by the database schema 206 (illustrated in detail in Figure 3B).
[0063] The server computer 106 is configured to construct the NVTH 134 that includes data types 214 corresponding to one or more of the equivalent data substructures identified by the machine learning network 132. These data types 214 may each include data subtypes that correspond to not only the equivalent substructures, but also the data description language, the database schemas 202, 206 of the data structures 200, 204, and schema transformation rules to and from the equivalent data substructures and the NVTH 134. In some embodiments, the server computer 106 is configured to create the NVTH 134 in an entirely automated manner. However, in other embodiments, the server computer 106 is configured to present user output through the input/output devices 124 to a user, wherein the user output describes the equivalent data substructures defined by the database schemas 202, 206. The server computer 106 may then receive user input from the user through the input/output devices 124, wherein the user input semantically describes the data types 214 corresponding to the one or more equivalent data substructures in the data structures 200, 204.
[0064] With regards to the specific example described above, the NVTH 134 may be constructed to include the data type 214 visually described in detail in Figure 2. The data type 214 may be named "Shirt Fabrication Fabric” and correspond to the data objects and data records in the data structures 200, 204 that correspond to the "Shirt Fabric 1” database schema 202 and the "Shirt Fabrication” database schema 206. The data type 214 includes a data subtype, named "data description language” for the data description language, a data subtype, named "SN” for the database schema name, and a data subtype, named "DTR” for data transformation rules. The data type 214 also includes a data subtype, named "L front panel” that corresponds to the equivalent properties "left front panel” and "piece 1” in the specifically illustrated data structures 200, 204, respectively. Furthermore, the data type 214 includes a data subtype, named "L sleeve” that corresponds to the equivalent properties "left sleeve” and "piece 2” in the specifically illustrated data structures 200, 204, respectively. The data type 214 includes a data subtype, named "L cufflink” that corresponds to the equivalent properties "left cufflink” and "piece 3” in the specifically illustrated data structures 200, 204, respectively. The data type 214 also includes a data subtype, named "R front panel” that corresponds to the equivalent properties "right front panel” and "piece 4” in the specifically illustrated data structures 200, 204, respectively. Furthermore, the data type 214 includes a data subtype, named "R sleeve” that corresponds to the equivalent properties "right sleeve” and "piece 5” in the specifically illustrated data structures 200, 204, respectively. The data type 214 includes a data subtype, named "R cufflink” that corresponds to the equivalent properties "right cufflink” and "piece 6” in the specifically illustrated data structures 200, 204, respectively. Additionally, the data type 214 includes a data subtype, named "B panel” that corresponds to the equivalent properties "back panel” and "piece 7” in the specifically illustrated data structures 200, 204, respectively. Also, the data type 214 includes a data subtype, named "shirt collar” that corresponds to the equivalent properties "collar” and "piece 8” in the specifically illustrated data structures 200, 204, respectively. Finally, the data type 214 includes a data subtype, named "Design Spec” that corresponds to the equivalent properties "Design” and both "design number” and "fabric type” in the specifically illustrated data structures 200, 204, respectively.
[0065] In this manner, any data object formatted in accordance with the data structure 200 defined by the JSON database schema 204 named "Shirt Fabric 1” can be mapped to the data type 214 of the NVTFI 134. In this case, the data subtype for "data description language” would be JSON, the data subtype for "SN” would be "Shirt Fabric 1 ," and the data subtype "DTR” would be schema transformation rules to and from the JSON database schema 204 to the data type 214 of the NVTF1 134. In accordance with the schema transformation rules, the "left front panel,” "left sleeve,” "left cufflink,” "right front panel,” "right sleeve,” "right cufflink,” "back panel,” and "collar,” of the data objects in data structure 200 would be mapped to the data subtypes "L front panel,” "L sleeve,” "L cufflink,” "R front panel,” "R sleeve,” "R cufflink,” "B panel,” and "shirt collar,” respectively, of the data type 214. Furthermore, the property named "design” would be mapped by a function to the data subtype "Design Spec.”
[0066] Also, any data record formatted in accordance with the data structure 204 defined by the XML database schema 206 named "Shirt Fabrication” can be mapped to the data type 214 of the NVTFI 134. In this case, the data subtype for "data description language” would be XML, the data subtype for "SN” would be "Shirt Fabrication,” and the data subtype "DTR” would be schema transformation rules to and from the XML database schema 206 to the data type 214 of the NVTF1 134. In accordance with the schema transformation rules, the "piece 1 ," "piece 2,” "piece 3,” piece 4,” "piece 5,” "piece 6,” "piece 7,” and "piece 8,” of the data records in data structure 204 would be mapped to the data subtypes "L front panel,” "L sleeve,” "L cufflink,” "R front panel,” "R sleeve,” "R cufflink,” "B panel,” and "shirt collar,” respectively, of the data type 214. Furthermore, the properties named "design number” and "fabric type” would be mapped by a function to the data subtype "Design Spec.” The NVTH 214 can therefore be used to encapsulate the data 1 12 in the JSON database 1 10 and the data 1 12 in the XML database 1 10. In this manner, requests from both the Java client application program 104 and the XML client application program 104 can be handled for both the data 1 12 in the JSON database 1 10 and the data 1 12 in the XML database 1 10 without regard to the particular syntax and scheme defined by database schemas 202, 204. (Throughout this disclosure, data is "encapsulated” when the implementation details of a data structure, data substructure, and/or data item are hidden and/or protected from outside access.)
[0067] Figure 4 visually represents exemplary data used to perform transformations with the NVTH 134. In this embodiment, the JSON database 1 10 stores the data structure 300 formatted in accordance with the JSON database schema 202 entitled "Shirt Fabric 1 ,” which was described in detail above. In this example, the data structure 300 includes data objects 302, 304, which each have values for each of the properties assigned by the JSON database schema 202 entitled "Shirt Fabric 1 These data objects 302, 304 are therefore data substructures that are formatted in accordance with the JSON database schema 202 (entitled "Shirt Fabric 1”) in the data structure 300.
[0068] In this embodiment, each of the data objects 302, 304 is a data output from manufacturing equipment 306. The manufacturing equipment 306 includes the client computer 102 that implements the Java client application program 104, which generates each of the data objects 302, 304 to model materials and/or devices output from the manufacturing equipment 306. For example, the manufacturing equipment 306 may be an industrial textile cutting machine, which takes a large piece of fabric and cuts it into shapes, which can be sewn together to make a particular garment. The properties of the data object 302 thereby model the particular fabric pieces cut out of a fabric for a particular shirt design and include bar code IDs for each of the bar codes provided with each piece of fabric. The data object 304 is structured in the same manner but models pieces output from the manufacturing equipment 306 of fabric for a different shirt design and during a different manufacturing cycle. In one example, the client computer 102 that implements the Java client application program 104 generates the data objects 302, 304 after their respective manufacturing cycles and sends them to the server computer 106 for storage in the JSON database 1 10.
[0069] Referring now to Figure 1 , Figure 2, Figure 3A, Figure 3B, and Figure 4, in this embodiment, the fabric pieces cut by the manufacturing equipment 306 (an industrial cutter) are input materials for the manufacturing equipment 308, which in this example is an automated industrial sewing machine. The manufacturing equipment 308 includes the client computer 102 that implements the XML client application program 104. To model the fabric pieces from the manufacturing equipment 306, the XML client application program 104 requires that the data input modeling the fabric pieces be formatted in accordance with the of XML database schema 206 entitled "Shirt Fabrication.” Thus, without the AP1 136 and NVTH 134 described herein, the XML client application program 104 of the client computer 102 in the manufacturing equipment 308 would not be able communicate with the JSON database 1 10. Accordingly, the XML client application program 104 would not be able to use the information in the data objects 302, 304 without the AP1 136 and NVTH 134 since these data objects 302, 304 are formatted in accordance with the JSON database schema 202 (which is incompatible with the XML client application program 104). Typically, in previously known solutions, a human would have to enter user input including blueprint identifiers, fabric identifiers, and bar codes in order for the manufacturing equipment 308 since the data output from the Java client application program 104 is incompatible with the XML client application program 104 and the XML database 1 10, which use XML database schemas 206, such as the XML database schema 206 entitled "Shirt Fabrication.” However, the API 136 of this embodiment uses the NVTH 134 to provide database schema transformations. Thus, the transfer of the fabric pieces from the manufacturing equipment 306 and the information from the data objects 302, 304 can be entirely automated despite the differences in database schemas 202, 206. It should be noted that XML based data transformations may be generated as extensible stylesheet language transformations (XSLT).
[0070] The server computer 106 is configured to receive a data substructure formatted in accordance with one of the database schemas 202 and transform a data item based on the data substructure into an equivalent data substructure formatted in accordance with one of the database schemas 206. (Throughout this disclosure, a "data item” may be any set of data formatted in accordance with some kind of defined data archetype. For example, a data item may be a data object, a data record, a data file, a data table, a defined combination of one or more of the data components listed herein, and/or the like.) Additionally, the server computer 106 is configured to receive a data substructure formatted in accordance with one of the database schemas 206 and transform a data item based on the data substructure into an equivalent data substructure formatted in accordance with one of the database schemas 202. This is because the data structures formatted in accordance with the database schemas 202 and the data substructures formatted in accordance with the database schemas 206 correspond with data types of the NVTH 134.
[0071] In one example, the data items transformed are the data substructures themselves while in other examples the data item may be the output of a function implemented by the API 136, as explained in further detail below. For instance, upon receiving the data object 302 from the Java client application program 104, the AP1 136 of the server computer 106 may transform the data object 302 into the data record 310 formatted in accordance with the database schema 206 entitled "Shirt Fabrication.” To transform the data object 302, the API 136 of the server computer 106 may encapsulate the data object 302 in accordance with the data type entitled "Shirt Fabrication Fabric” (shown in Figure 2) of the NVTH 134. In one embodiment, the AP1 136 encapsulates the data object 302 by recognizing that the data object 302 corresponds to the data type 214 entitled "Shirt Fabrication Fabric” of the NVTH 134 and then mapping the data object 302 to a buffer data item 312 formatted in accordance with the NVTH 134. In this case, the data subtype for "data description language” of the buffer data item 312 would be JSON, the data subtype for "SN” of the buffer data item would be "Shirt Fabric 1 ,” and the data subtype "DTR” would be schema transformation rules to and from the JSON database schema 204 and the data type 214 of the NVTH 134. In accordance with the schema transformation rules, measurement and the bar code ID values of the "left front panel,” "left sleeve,” "left cufflink,” "right front panel,” "right sleeve,” "right cufflink,” "back panel,” and "collar,” of the data object 302 in data structure 300 would be mapped to the measurement values and bar code ID values of the data subtypes "L front panel,” "L sleeve,” "L cufflink,” "R front panel,” "R sleeve,” "R cufflink,” "B panel,” and "shirt collar,” respectively. Furthermore, the property named "design” would be mapped by a function to the data subtype "Design Spec.”
[0072] The data object 302 is thereby encapsulated as the buffer data item 312 without regard to the syntax and specifics of JSON. The AP1 136 of the server computer 106 may then transform the buffer data item 312 into the data record 310 formatted in accordance with the database schema 206 entitled "Shirt Fabrication” with the schema transformation rules. For example, the AP1 136 may map the buffer data item 312 to the data record 310 since the data record 310 corresponds to the data type 214. In accordance with the schema transformation rules, the measurement and the bar code ID values of the data subtypes "L front panel,” "L sleeve,” "L cufflink,” "R front panel,” "R sleeve,” "R cufflink,” "B panel,” and "shirt collar,” are mapped to the "piece 1 ,” "piece 2,” "piece 3,” piece 4,” "piece 5,” "piece 6,” "piece 7,” and "piece 8,” of the data record 310, respectively. Furthermore, the values of the data subtype, named "Design Spec” are mapped to the equivalent properties "Design” and both "design number” and "fabric type” of the data record 310. The data object 304 is transformed into the data record 314 in the same manner.
[0073] In the embodiment shown in Figure 4, the server computer 106 may store the database records 310, 314 in a data structure 316 formatted in accordance with the database schema 206 entitled "Shirt Fabrication.” Accordingly, the XML client application program 104 implemented by the client computer 102 in the manufacturing equipment 308 may make a read request to the XML database 1 10 and obtain the database record 314 and/or the database record 316 that model the fabric pieces output from the manufacturing equipment.
[0074] In alternative embodiments, the data records 310, 314 are not stored in the XML database 1 10. Rather, the XML client application program 104 makes requests to the JSON database 110 for the data objects 302, 304. As a response to the requests, the AP1 136 of the server computer 106 performs the above-described transformation to provide the database records 310, 314 to the XML client application program 104. In still other embodiments, the data objects 302, 304 are not stored in the JSON database 1 10. Instead, the Java client application program 104 implemented by the client computer 102 in the manufacturing equipment 306, 308 makes a write request to the XML database 1 10. The data objects 302, 304 would then be transformed by the API 136 into the data records 310, 314, as described above, and then the data records 310, 314 would be stored in the XML database 1 10.
[0075] The API 136 of the server computer 106 may also perform data functions for the Java and/or the XML client application programs 104. The data functions may receive a data input from the JSON database 1 10 and provide a data output for the XML client application program 104. Additionally, the data functions may receive a data input from the XML database 110 and provide a data output for the Java client application program 104.
[0076] In one embodiment, the AP1 136 of the server computer 106 is configured to implement data functions having a data substructure from the JSON database 1 10 as an input so as to generate a data item formatted in accordance to one of the database schemas 202. The API 136 of the server computer 106 is then configured to transform the data item into a data substructure formatted in accordance with one of the XML database schemas 206. For example, the AP1 136 of the server computer 106 is configured to implement a data function having one of the data objects 302, 304 as input. The data function generates a data item 322 having the "left front panel,” "left sleeve,” "left cufflink,” "right front panel,” "right sleeve,” "right cufflink,” "back panel,” and "collar” properties of the particular input data object 302, 304 in the order that the fabric pieces corresponding to these properties are to be received by the manufacturing equipment 306, 308.
[0077] The AP1 136 of the server computer 106 will then encapsulate the data item 322 in accordance with the data subtypes "L front panel,” "L sleeve,” "L cufflink,” "R front panel,” "R sleeve,” "R cufflink,” "B panel,” and "shirt collar,” in the data type 214 of the NVTH 214. The AP1 136 may then transform the encapsulated data item 322 into an equivalent data record 324 formatted in accordance with the "piece 1 ,” "piece 2,” "piece 3,” "piece 4,” "piece 5,” "piece 6,” "piece 7,” "piece 8” of the database schema 206 entitled "Shirt Fabrication.” The equivalent data record 324 may then be provided to the XML client application program 104.
[0078] Also, the AP1 136 of the server computer 106 is configured to implement data functions having a data substructure from the XML database 1 10 as an input so as to generate a data item formatted in accordance to one of the database schemas 206. The API 136 of the server computer 106 is then configured to transform the data item into a data substructure formatted in accordance with one of the JSON database schemas 202. For example, the API 136 of the server computer 106 is configured to implement a data function having a design number and fabric type from the data record 310 as inputs. The data function generates a data item 326 that has matching design numbers and fabric types. The API 136 transforms the data item 326 into a data object 328 formatted in accordance with the database schema 202 entitled "Shirt Fabric 1” (See Figure 3A) using the data type 214 of the NVTF1 134, as described above. The data object 328 may then be provided to the Java client application program 110
[0079] Figure 5 illustrates exemplary procedures of a machine learning method that may be implemented by a computer system (e.g., the computer system 100 shown in Figure 1 ). Different embodiments of these exemplary procedures may be implemented depending on the particular implementation details of the computer system. Furthermore, the order in which the procedures are presented is not intended to imply a required sequence for the procedures. Rather, the procedures may be implemented in a different sequence and/or some or all of the procedures may be implemented simultaneously.
[0080] To implement the machine learning method, the computer system 100 may provide one or more databases that store data structures formatted in accordance with database schemas (procedure 400). Each of the database(s) includes at least one of the data structures formatted in accordance with at least one of the database schemas. A data structure may be any type of data organized in accordance with a database schema. A data substructure may be any subset of the data in one or more data structures. For example, a data structure may be a collection of data objects or data records formatted in accordance with a database schema. A data structure may also be data objects or data records along with lists, tables, metadata, and other associated digital information related to the data objects and data records. Additionally, a data substructure may be some subset of the data object or data records, some subset of properties, fields, or attributes in one or more of the data object or data records, along with any associated digital information if any.
[0081] To determine the relationships between the database schemas, the computer system 100 may implement a machine learning network on the computer system 100 to identify a plurality of equivalent data substructures in the data structures defined by the database schemas (procedure 402). In some implementations, at least some of the database schemas are defined in different data description languages. Thus, implementing the machine learning network on the computer system 100 includes identifying equivalent data substructures in the data structures formatted in accordance with database schemas defined in the different data description languages. In other implementations, the database schemas may be defined by the same data description language.
[0082] In addition, in some implementations, the machine learning network predicts the equivalency between data substructures and a user confirms or rejects the equivalency. For example, the computer system 100 may implement the machine learning network to identify data substructures in the data structures that the machine learning network predicts are equivalent and receive user input indicating that one or more of the identified data substructures are one or more of the equivalent data substructures. In still other implementations, the machine learning network identifies the equivalent substructures in an entirely automated manner.
[0083] The machine learning network may be any suitable type of machine learning network, as described above. In some implementations, the machine learning network is implemented entirely as a software program. In other implementations, the machine learning network may be implemented entirely by specially designed hardware. In still other implementations, the machine learning network is implemented in some combination of software and specially designed hardware.
[0084] The computer system 100 may construct an NVTH that includes data types corresponding to one or more of the plurality of equivalent data substructures identified by the machine learning network (procedure 404). In some implementations, determining the relationships between the database schemas is entirely automated but the naming of the data types of the NVTH is only partially automated. Thus, constructing the NVTH includes presenting user output to a user on the computer system 100 where the user output describes the equivalent data substructures defined by the database schemas. Furthermore, the computer system 100 may receive user input from the user on the computer system 100 where the user input semantically describes the data types corresponding to the one or more equivalent data substructures. In other implementations, the naming is entirely automated.
[0085] In some embodiments, the computer system 100 may initially be configured to train the machine learning network with test data and target results data (procedure 406). Thus, prior to procedure 402 (and/or procedure 400), the computer system 100 may perform procedure 406. Once trained, the machine learning network can be implemented to identify equivalent substructures. It should, however, be noted that, in some embodiments, the machine learning network does not require training and may be capable of identifying equivalent substructures dynamically and in real-time. This, of course, may depend on the processing power of the computer system 100 and the capabilities of the particular machine learning network being implemented.
[0086] Figure 6 illustrates exemplary procedures of a schema transformation method that may be implemented by a computer system 100 (e.g., the computer system 100 100 shown in Figure 1 ). Different embodiments of these exemplary procedures may be implemented depending on the particular implementation details of the computer system 100. Furthermore, the order in which the procedures are presented is not intended to imply a required sequence for the procedures. Rather, the procedures may be implemented in a different sequence and/or some or all of the procedures may be implemented simultaneously.
[0087] With regard to the exemplary schema transformation method, the computer system 100 may receive a first data substructure formatted in accordance with a first database schema (procedure 500). The data substructure may be received from a database or may be received from a client application program, wherein the first data substructure corresponds with a first data type of the NVTH. The computer system 100 may then transform a data item based on the first data substructure into an equivalent second data substructure formatted in accordance with a second database schema that is different from the first database schema, wherein the second data substructure corresponds with the first data type of the NVTH (procedure 502). In some examples, the first database schema is defined in a first data description language and the second database schema is defined in a second data description language that is different from the first data description language. Alternatively, the first database schema and the second database schema may be defined in the same data description language.
[0088] In some implementations, the data item is the first data substructure itself. In this case, the computer system 100 transforms the first data substructure into the second data substructure at procedure 502. To do this, the computer system 100 may encapsulate the first data substructure in accordance with the first data type of the NVTH and transform the encapsulated first data substructure into the second data substructure formatted in accordance with the second database schema. The computer system 100 may encapsulate the first data substructure by recognizing that the first data substructure corresponds to the first data type of the NVTH and map the first data substructure to a buffer data item formatted in accordance with the first data type such that the buffer data item is the encapsulated first data substructure.
[0089] Alternatively, the data item may be the output of a data function rather than the first data substructure itself. In this case, prior to procedure 502, the computer system 100 may implement a function having the first data substructure as an input so as to generate the data item such that the data item is formatted in accordance with the first database schema (procedure 504). The computer system 100 may then perform procedure 502 by encapsulating the data item in accordance with the first data type and transforming the encapsulated data item into the equivalent second data structure.
[0090] In some applications, the first data substructure stored by the first database is a data output from first manufacturing equipment and the equivalent second data substructure is a data input to a second database for second manufacturing equipment. While the specific examples discussed above with respect to Figure 4 were related to an industrial fabric cutter and an automated industrial sewing machine, the above-described systems and methods are applicable whenever different manufacturing equipment needs to work together despite receiving and/or transmitting data inputs or data outputs formatted in accordance with incompatible database schemas. More broadly, the systems and methods described herein increase the efficiency and interoperability of computer systems with data stored in accordance with incompatible database schemas. This thereby enhances the operation of the computer system and allows heterogeneous database programs to use data regardless of the database schema. [0091] Referring now to Figure 1 and Figure 7, Figure 7 illustrates the JSON database 1 10 that stores data 1 12 involved in the construction of an NVTFI 700 (See Figure 1 ) and the XML database 1 10 that stores data 1 12 involved in the construction of an NVTFI 702 (See Figure 1 ). In this embodiment, the NVTFI 700 and the NVTFI 702 shown in Figure 1 are both constructed by the server computer 106 using the machine learning network 132 shown in Figure 2. In this example, the NVTFI 700 is constructed using the JSON database 1 10 while the NVTFI 702 is constructed using the XML database 110. Flowever, it should be noted that the same principles and techniques described herein are applicable to construct examples of the NVTFI 700 and the NVTFI 702 that canonicalizes the data 1 12 of any combination of the databases 1 10 (e.g., the SQL database 110 and/or the Codasyl database 110 in Figure 1 ). Furthermore, while the exemplary data description languages in the example described above in Figure 1 are JSON, XML, SQL, and Codasyl, the described techniques are equally applicable to data 1 12 formatted in accordance to database schemas defined in accordance to other types of data description languages such as Python, PHP, Ruby, Perl, just to name a few. Furthermore, the data 112 does not necessarily have to be on separate databases 1 10. For example, some databases 1 10 are capable of storing data 1 12 formatted using different data description languages in separate domains. Finally, the described techniques can be utilized whenever subsets of the data 112 are not mutually exclusive but are schematically incompatible. These and other application for the principles and techniques described herein would be apparent to one of ordinary skill in the art in light of this disclosure.
[0092] In this example, however, it is presumed that the database schemas for the data 1 12 in the JSON database 110 and for the data 1 12 in the XML database 110 are not provided to the computer system 100. Instead, as shown in Figure 7, the data 1 12 stored in the JSON database 1 10 is organized as data structures 704 wherein each of these data structures 704 is formatted in JSON and is a portion of the data 1 12 in the JSON database 1 10. There may be no known or digitally recorded schema for the data structures 704. In this example, the data structures 704 provide JSON data for pieces of automobiles and sections and parts of automobiles from a car manufacturer, which we will call ACME. For instance, one of the data structures may include information regarding certain car models manufactured by ACME, design information regarding the motors of car models manufactured by ACME, and model information for car parts of cars manufactured by ACME.
[0093] With regard to the XML database 1 10, the data 1 12 stored in the XML database 1 10 is organized as data structures 706 wherein each of these data structures 706 is formatted in XML and is a portion of the data 1 12 in the XML database 1 10. There may be no known or digitally recorded schema for the data structures 706. In this example, the data structures 706 provide XML data for pieces of automobiles and sections and parts of automobiles from another car manufacturer, which we will call CARX. For instance, one of the data structures may include information regarding certain car models manufactured by CARX, design information regarding the motors of car models manufactured by CARX, and model information for car parts of cars manufactured by CARX.
[0094] Some of the data structures 704 in the JSON database 110 may be equivalent to some of the data structures 706 in the XML database 1 10. For instance, ACME and CARX may use the same seats, windshields, engine parts, tires, etc. for some of their car models. Flowever, the database schemas for the data structures 704, 706 are unknown and therefore there is no way of being able to know what data is the same or otherwise equivalent when the schema for the data 1 12 is unknown. To solve this problem, database schemas must first be constructed for both the data structures 704 and the data structures 706 despite both being in different data description languages. These database schemas provide a canonicalization of the format of the data structures 704, 706 so that the format of the data structures 704, 706 can be compared.
[0095] In this regard, the server computer 106 is configured to implement the machine learning network 132 so that the machine learning network 132 constructs the NVTH 700 from the data structures 704 in the JSON database 1 10. The machine learning network 132 may be trained and may seek user input (as described above) in order to construct the NVTH 700. The NVTH 700 thus provides a learned database schema for the data structures 704 based on the information analyzed by the machine learning network 132 regarding these data structures 704. The NVTH 700 thus canonicalizes the data structures 704.
[0096] Figure 8A illustrates a detailed example of the NVTH 700, this time presuming that the NVTH 700 is canonicalizing the data structures 704 regarding the data 1 12 from ACME. The NVTH 700 that is shown in detail in Figure 8A is a data object archetype, named "ACME AUTOMOBILE.” For the sake of explanation, the NVTH 700 is highly simplified. The NVTH 700 is of the JSON type and includes data types such as a string named "car design” to identify the model of a car and six different properties named, "engine type,” "front seat type,” "back seat type,” "trunk design,” "car door combination,” and "radiator type.” Each of these properties is associated with an integer value that identifies the particular type of car part. These names and data types may be learned by the machine learning network 132 to identify a workable format for the data structures 704. Note that "car design,” "engine type,” "front seat type,” "back seat type,” "trunk design,” "car door combination,” and "radiator type,” are subschemas of the schema provided by the NVTH 700.
[0097] It should be noted that some of the data structures 704 may have included certain instances of only data that corresponds with the "engine type.” Other instances of the data structures 704 may have included data that corresponds to only the "front seat type,” "back seat type,” "trunk design,” "car door combination,” or "radiator type,” respectively. Still other instances of the data structures 704 may have been included data that corresponds to the "car design” along with data that corresponds to all of the "engine type,” "front seat type,” "back seat type,” "trunk design,” "car door combination,” and "radiator type.” The machine learning network 132 thus analyzes the data structures 704 to provide a canonicalization of the data and produce a format for the data structures 704, which is provided as the NVTH 700. In this embodiment, although the data structures 704 are themselves formatted in JSON, the NVTH 700 may be provided in a common data description language, such as Java.
[0098] With regard to the NVTH 702, the server computer 106 is also configured to implement the machine learning network 132 so that the machine learning network 132 constructs the NVTH 702 from the data structures 706 in the XML database 1 10. The machine learning network 132 may be trained and may seek user input (as described above) in order to construct the NVTH 702. The NVTH 702 thus provides a learned database schema for the data structures 706 based on the information analyzed by the machine learning network 132 regarding these data structures 706. The NVTH 702 thus canonicalizes the data structures 706.
[0099] Figure 8B illustrates a detailed example of the NVTH 702, this time presuming that the NVTH 702 is canonicalizing the data structures 706 regarding the data 1 12 from CARX. The NVTH 702 that is shown in detail in Figure 8B is a data object archetype, named "CARX AUTOMOBILE.” For the sake of explanation, the NVTFI 702 is highly simplified. The NVTFI 702 is of the XML type and includes data types such as an integer named "car type” to identify the model of a car and six different properties named, "engine type,” "seat combination type,” "trunk type,” "front car doors,” "back car doors,” and "radiator type.” Each of these properties is associated with an integer value that identifies the particular type of car part. These names and data types may be learned by the machine learning network 132 to identify a workable format for the data structures 706. Note that "engine type,” "seat combination type,” "trunk type,” "front car doors,” "back car doors,” and "radiator type,” are subschemas of the schema provided by the NVTFI 702.
[00100] It should be noted that some of the data structures 706 might have included certain instances of only data that corresponds with the "engine type.” Other instances of the data structures 706 may have included data that corresponds to only the "seat combination type,” "trunk type,” "front car doors,” "back car doors,” or "radiator type,” respectively. Still other instances of the data structures 706 may have been included data that corresponds to the "car type” along with data that corresponds to all of the "engine type,” "seat combination type,” "trunk type,” "front car doors,” "back car doors,” and "radiator type.” The machine learning network 132 thus analyzes the data structures 706 to provide a canonicalization of the data and produce a workable format for the data structures 706, which is provided as the NVTFI 702. In this embodiment, although the data structures 706 are themselves formatted in XML, the NVTFI 700 may be provided in a common data description language, such as Java.
[00101] Now that the NVTFI 700 and the NVTFI 702 provide database schemas for the data structures 704 and the data structures 706, respectively, the server computer 106 can analyze the NVTFI 700 and the NVTFI 702 (along with the data structures 704, 706) to determine any equivalent subschemas between the NVTFI 700 and NVTFI 702. In this manner, the server computer can learn to translate data structures 704 and the data structures 706. More specifically, the server computer 106 is configured to implement the machine learning network 132 to learn to transform between subschemas in the NVTFI 700 and the subschemas in the NVTFI 702 when the subschemas of the NVTFI 700 and the subschemas of the NVTFI 702 are equivalent.
[00102] In this example, the "engine type” of the NVTFI 700 is equivalent to the "engine type” of the NVTFI 702. The combination of the "front seat type” and "back seat type” of the NVTFI 700 is equivalent to the "seat combination type” of the NVTFI 702. The "trunk design” of the NVTFI 700 is equivalent to the "trunk type” of NVTFI 702. The "car door combination” of the NVTFI 700 is equivalent to the combination of the "front car doors” and the "back car doors” of the NVTFI 702. Finally, the "radiator type” of the NVTFI 700 is equivalent to the "radiator type” of the NVTFI 702.
[00103] By analyzing the data structures 704, the data structures 706, the NVTFI 700, and the NVTFI 702, the machine learning network 132 can identify the equivalent data substructures and the equivalent subschemas in the NVTFI 700 and the NVTFI 702. Furthermore, the machine learning network 132 learns transformation rules for transforming the subschemas of the NVTFI 700 and the NVTFI 702 based on the relationships between the data structures 704 and the data structures 706 and between the NVTFI 700 and the NVTFI 702. In this manner, the server computer 106 is configured to implement the machine learning network 132 that learns to transform between the subschemas in the NVTFI 700 and the subschemas in the NVTFI 702. Thus, the server computer 106 may receive one of the data structures 704 from the JSON client computer 102, as a data substructure 704 of the NVTH 700. Furthermore, the subschema that corresponds to the format of the data substructure 704 is equivalent to one of the subschemas in the NVTH 702. For example, the data substructure 704 may be formatted in accordance to the "trunk design” of the NVTH 700, which is equivalent to the "trunk type” of the NVTH 702.
[00104] The server computer 106 may implement the machine learning network 132 to transform data substructure 704 into a data substructure 706 formatted in accordance with the equivalent subschema of the NVTH 702. For example, assuming that the data substructure 704 is of the "trunk design” of the NVTH 700, the machine learning network 132 would generate a data substructure 706 that is formatted in accordance with the "trunk type” of the NVTH 702, where the data substructure 706 is equivalent to the data substructure 704. The machine learning network 132 would utilize learned transformation rules to provide the transformation. In the aforementioned example, integers for codes of the "trunk design” of the NVTH 700 may be transformed into integers for codes of the "trunk type” of the NVTH 702.
[00105] It should be noted that, in this embodiment, the machine learning network 132 implemented by the server computer 106 constructs the NVTHs 700, 702 and learns to provide the transformations so that data substructures can be transformed. This may or may not be the case and in other implementations, other computer devices on other computer systems may construct the NVTHs 700, 702 and learn to provide the transformations. For example, in an alternative embodiment, a second machine learning network on another computer system may be implemented so that the second machine learning network constructs the NVTH 700 from the data structures 704 and constructs the NVTH 702 from the data structures 706. The server computer 106 may then receive the NVTHs 700, 702 and the data structures 704, 706 and implement the machine learning network 132 to learn to transform between the subschemas of the NVTHs 700, 702. In yet another alternative embodiment, a second machine learning network on another computer system may be implemented so that the second machine learning network constructs the NVTH 700 from the data structures 704 and a third machine learning network on still another computer system may be implemented so that the third machine learning network constructs the NVTH 702 from the data structures 706. The server computer 106 may then receive the NVTHs 700, 702 and the data structures 704, 706 and implement the machine learning network 132 to learn to transform between the subschemas of the NVTHs 700, 702. In still other implementations, the server computer 106 may implement the machine learning network 132 to construct the NVTHs 700, 702 while another machine learning network on another computer system learns to transform between the subschemas of the NVTHs 700, 702. These and other implementations would be apparent to one of ordinary skill in the art in light of this disclosure.
[00106] Figure 9 illustrates exemplary procedures of a machine learning method that may be implemented by a computer system (e.g., the computer system 100 shown in Figure 1 ). Different embodiments of these exemplary procedures may be implemented depending on the particular implementation details of the computer system. Furthermore, the order in which the procedures are presented is not intended to imply a required sequence for the procedures. Rather, the procedures may be implemented in a different sequence and/or some or all of the procedures may be implemented simultaneously. [00107] To implement the machine learning method, the computer system 100 may provide one or more databases that store a first set of data structures and a second set of data structures, (procedure 900). To determine the format of the first set of data structures and the second set of data structures, the computer system 100 may implement a machine learning network so that the machine learning network constructs a first NVTH from the first set of data structures (procedure 902). Additionally, the computer system 100 implements the machine learning network so that the machine learning network constructs a second NVTH from the second set of data structures (procedure 904). In some implementations, the first set of data structures and the second set of data structures are defined in different data description languages.
[00108] Next, the computer system 100 may implement the machine learning network so as to learn to transform between a first set of subschemas in the first NVTH and a second set of subschemas in the second NVTH, wherein the first set of subschemas and the second set of subschemas are equivalent (procedure 906). It should be noted that in alternative embodiments a second computer system might implement a second machine learning network so that the second machine learning network constructs the first NVTH from the first set of data structures and constructs the second NVTH from the second set of data structures. Additionally, in still other alternative embodiments a second computer system may implement a second machine learning network so that the second machine learning network constructs the first NVTH from the first set of data structures and a third computer system implements a third machine learning network to construct the second NVTH from the second set of data structures. These and other procedures would be apparent to one of ordinary skill in the art in light of this disclosure.
[00109] Once the machine learning network has learned to provide the transformations, the computer system 100 may receive a first data substructure that is formatted in accordance with a first corresponding one of the first set of subschemas, wherein the first corresponding one of the first set of subschemas is equivalent to a first corresponding one of the second set of subschemas (procedure 908). In turn, the computer system 100 may implement the machine learning network on the computer system to transform the first data substructure into a second data substructure formatted in accordance with the first corresponding one of the second set of subschemas (procedure 910).
[00110] In some applications, the first data substructure stored by the first database is a data output from a car parts manufacturer, which may be formatted in accordance to the requirements of a particular car company (e.g. ACME as discussed above). The automobile manufacturer that is actually using the car part to make the automobile may require a different database format (e.g., a manufacturer for CARX may use the same part but a different database format) than the car parts manufacturer. The machine learning network thus allows for transformations between the database formats so that the automobile manufacturer can use the part made by the car parts manufacturer in the automobile manufacturer's industrial automobile manufacturing process. The systems and methods described herein thereby increase the efficiency and interoperability of computer systems with data stored in accordance with incompatible database formats. This thereby enhances the operation of the computer system and allows heterogeneous database programs to use data regardless of the database format and/or the data description language. [00111] Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.

Claims

Claims What is claimed is:
1. A machine learning method, comprising:
providing one or more databases that store data structures formatted in accordance with database schemas, wherein each of the one or more databases includes at least one of the data structures formatted in accordance with at least one of the database schemas;
implementing a machine learning network on a computer system to identify a plurality of equivalent data substructures in the data structures defined by the database schemas; and
constructing a name value type hierarchy (NVTH) that includes data types corresponding to one or more of the plurality of equivalent data substructures identified by the machine learning network.
2. The machine learning method of claim 1 , wherein at least two of the database schemas are defined in different data definition languages and wherein implementing the machine learning network on the computer system includes identifying equivalent data substructures in the data structures formatted in the at least two of the database schemas defined in the different data definition languages.
3. The machine learning method of claim 1 , wherein constructing the NVTH comprises:
presenting user output to a user on the computer system, wherein the user output describes the plurality of equivalent data substructures defined by the database schemas; and
receiving user input from the user on the computer system, wherein the user input semantically describes the data types corresponding to the one or more of the plurality of equivalent data substructures.
4. The machine learning method of claim 1 , further comprising training the machine learning network on the computer system with test data and target results data.
5. The machine learning method of claim 1 , wherein implementing the machine learning network on the computer system, comprises:
identifying data substructures in the data structures that the machine learning network predicts are equivalent; and
receiving user input indicating that one or more of the identified data substructures are the one or more of the plurality of equivalent data substructures.
6. The machine learning method of claim 1 , further comprising:
receiving a first data substructure from a first database, wherein the first data substructure is formatted in accordance with a first database schema of the database schemas and wherein the first data substructure corresponds with a first data type of the one or more data types of the NVTH; and encapsulating the first data substructure in accordance with the first data type of the one or more of the data types of the NVTH.
7. The machine learning method of claim 6, wherein encapsulating the first data substructure in accordance with the first data type of the one or more of the data types of the NVTH comprises:
recognizing that the first data substructure corresponds with the first data type; and
mapping the first data substructure to a data item formatted in accordance with the first data type such that the data item is the encapsulated first data substructure.
8. The machine learning method of claim 6, further comprising transforming the data item into an equivalent second data substructure formatted in accordance with a second database schema of the database schemas, wherein the second data substructure also corresponds with the first data type of the NVTH.
9. The machine learning method of claim 8, wherein transforming the encapsulated first data substructure into the equivalent second data substructure comprises mapping the data item to the equivalent second data substructure formatted in accordance with a second database schema of the database schemas.
10. The machine learning method of claim 1 , further comprising:
receiving a first data substructure from a first database, wherein the first data substructure is formatted in accordance with a first database schema of the database schemas and corresponds to a first data type of the one or more data types of the NVTH;
implementing a function having the first data substructure as an input so as to generate a data item formatted in accordance with the first database schema; and
transforming the data item into an equivalent second data substructure formatted in accordance with a second database schema of the database schemas, wherein the second data substructure also corresponds with the first data type of the NVTH.
11. The machine learning method of claim 10, wherein transforming the data item into the second data substructure formatted in accordance with a second database schema of the database schemas, comprises: encapsulating the data item in accordance with the first data type; and
transforming the encapsulated data item into the equivalent second data substructure formatted in accordance with the second database schema.
12. The schema transformation method of claim 10, further comprising:
the first data substructure stored by the first database is a data output from first manufacturing equipment; and the second data substructure is data input to second manufacturing equipment.
13. A schema transformation method, comprising:
receiving a first data substructure formatted in accordance with a first database schema, wherein the first data substructure corresponds with a first data type of a name value type hierarchy (NVTH); and
transforming a data item based on the first data substructure into an equivalent second data substructure formatted in accordance with a second database schema that is different from the first database schema, wherein the second data substructure corresponds with the first data type of the NVTH.
14. The schema transformation method of claim 13, wherein first database schema is defined in a first data definition language and the second database schema is defined in a second data definition language that is different from the first data definition language.
15. The schema transformation method of claim 13, wherein the data item is the first data substructure so that the first data substructure is transformed into the second data substructure and wherein transforming the first data substructure into the second data substructure comprises:
encapsulating the first data substructure in accordance with the first data type of the NVTH; and transforming the encapsulated first data substructure into the second data substructure formatted in accordance with the second database schema.
16. The schema transformation method of claim 15, wherein encapsulating the first data substructure in accordance with the first data type of the NVTH comprises:
recognizing that the first data substructure corresponds to the first data type of the NVTH; and mapping the first data substructure to a buffer data item formatted in accordance with the first data type such that the buffer data item is the encapsulated first data substructure.
17. The schema transformation method of claim 13, further comprising implementing a function having the first data substructure as an input so as to generate the data item such that the data item is formatted in accordance with the first database schema, wherein transforming the data item into the equivalent second data substructure comprises:
encapsulating the data item in accordance with the first data type; and
transforming the encapsulated data item into the equivalent second data structure.
18. The schema transformation method of claim 13, wherein:
the first data substructure stored is a data output from first manufacturing equipment; and
the second data substructure is a data input for second manufacturing equipment.
19. A computer system configured to be operably associated with one or more databases that store data structures formatted in accordance with database schemas, wherein each of the one or more databases includes at least one of the data structures formatted in accordance with at least one of the database schemas, the computer system comprising:
at least one memory, wherein the at least one memory stores computer-executable instructions;
one or more processors operably associated with the at least one memory, wherein, when executed by the one or more processors, the computer-executable instructions cause the one or more processors to:
implement a machine learning network that identifies a plurality of equivalent data substructures in the data structures defined by the database schemas; and
construct a name value type hierarchy (NVTH) that includes data types corresponding to one or more of the plurality of equivalent data substructures identified by the machine learning network.
20. A computer system, comprising:
at least one memory, wherein the at least one memory stores computer-executable instructions;
one or more processors operably associated with the at least one memory, wherein, when executed by the one or more processors, the computer-executable instructions cause the one or more processors to:
receive a first data substructure formatted in accordance with a first data structure, wherein the first data substructure corresponds with a first data type of a name value type hierarchy (NVTH); and transform a data item based on the first data substructure into an equivalent second data substructure formatted in accordance with a second database schema that is different from the first database schema, wherein the second data substructure corresponds with the first data type of the NVTH.
PCT/US2020/019441 2019-02-23 2020-02-24 Computer systems and methods for database schema transformations WO2020172650A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962809599P 2019-02-23 2019-02-23
US62/809,599 2019-02-23
US201962816518P 2019-03-11 2019-03-11
US62/816,518 2019-03-11

Publications (1)

Publication Number Publication Date
WO2020172650A1 true WO2020172650A1 (en) 2020-08-27

Family

ID=72143898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/019441 WO2020172650A1 (en) 2019-02-23 2020-02-24 Computer systems and methods for database schema transformations

Country Status (1)

Country Link
WO (1) WO2020172650A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665677B1 (en) * 1999-10-01 2003-12-16 Infoglide Corporation System and method for transforming a relational database to a hierarchical database
US20050216893A1 (en) * 2004-03-25 2005-09-29 Anson Horton Proxy objects for display
US20100023925A1 (en) * 2008-07-24 2010-01-28 Sap Portals Israel Ltd System and method for transforming hierarchical objects
US20110066577A1 (en) * 2009-09-15 2011-03-17 Microsoft Corporation Machine Learning Using Relational Databases
US20160127322A1 (en) * 2014-10-29 2016-05-05 International Business Machines Corporation Masking data within json-type documents
US20180068001A1 (en) * 2007-11-29 2018-03-08 Bdna Corporation External system integration into automated attribute discovery
US20180067732A1 (en) * 2016-08-22 2018-03-08 Oracle International Corporation System and method for inferencing of data transformations through pattern decomposition

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665677B1 (en) * 1999-10-01 2003-12-16 Infoglide Corporation System and method for transforming a relational database to a hierarchical database
US20050216893A1 (en) * 2004-03-25 2005-09-29 Anson Horton Proxy objects for display
US20180068001A1 (en) * 2007-11-29 2018-03-08 Bdna Corporation External system integration into automated attribute discovery
US20100023925A1 (en) * 2008-07-24 2010-01-28 Sap Portals Israel Ltd System and method for transforming hierarchical objects
US20110066577A1 (en) * 2009-09-15 2011-03-17 Microsoft Corporation Machine Learning Using Relational Databases
US20160127322A1 (en) * 2014-10-29 2016-05-05 International Business Machines Corporation Masking data within json-type documents
US20180067732A1 (en) * 2016-08-22 2018-03-08 Oracle International Corporation System and method for inferencing of data transformations through pattern decomposition

Similar Documents

Publication Publication Date Title
US8260824B2 (en) Object-relational based data access for nested relational and hierarchical databases
US7761586B2 (en) Accessing and manipulating data in a data flow graph
CN107491561B (en) Ontology-based urban traffic heterogeneous data integration system and method
US6704743B1 (en) Selective inheritance of object parameters in object-oriented computer environment
US20200334272A1 (en) Metadata hub for metadata models of database objects
TWI412945B (en) Retrieving and persisting objects from/to relational databases
Lu et al. Multi-model Data Management: What's New and What's Next?
US8527532B2 (en) Transforming function calls for interaction with hierarchical data structures
US5535325A (en) Method and apparatus for automatically generating database definitions of indirect facts from entity-relationship diagrams
US20180174057A1 (en) Methods and systems for providing improved data access framework
US9785725B2 (en) Method and system for visualizing relational data as RDF graphs with interactive response time
CN110019287B (en) Method and device for executing Structured Query Language (SQL) instruction
US20140136511A1 (en) Discovery and use of navigational relationships in tabular data
CN103678451A (en) Method and system for spreadsheet schema extraction
US11561976B1 (en) System and method for facilitating metadata identification and import
US20230091845A1 (en) Centralized metadata repository with relevancy identifiers
Parameswaran et al. Optimizing open-ended crowdsourcing: The next frontier in crowdsourced data management
US9665601B1 (en) Using a member attribute to perform a database operation on a computing device
EP4361841A1 (en) Data object management using data object clusters
CN111506779B (en) Object version and associated information management method and system facing data processing
US20160004754A1 (en) Generic API
US20100094893A1 (en) Query interface configured to invoke an analysis routine on a parallel computing system as part of database query processing
US12079251B2 (en) Model-based determination of change impact for groups of diverse data objects
JP2008505390A (en) Functional operation of access and / or construction of a federated tree datastore for use with application software
WO2020172650A1 (en) Computer systems and methods for database schema transformations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20758533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/01/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20758533

Country of ref document: EP

Kind code of ref document: A1