US20180329970A1 - Providing metadata to database systems and environments with multiple processing units or modules - Google Patents
Providing metadata to database systems and environments with multiple processing units or modules Download PDFInfo
- Publication number
- US20180329970A1 US20180329970A1 US16/044,654 US201816044654A US2018329970A1 US 20180329970 A1 US20180329970 A1 US 20180329970A1 US 201816044654 A US201816044654 A US 201816044654A US 2018329970 A1 US2018329970 A1 US 2018329970A1
- Authority
- US
- United States
- Prior art keywords
- database
- processing units
- objects
- data
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 142
- 238000003860 storage Methods 0.000 claims abstract description 64
- 238000000034 method Methods 0.000 claims abstract description 47
- 230000014759 maintenance of location Effects 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 17
- 230000004044 response Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000013499 data model Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 125000003345 AMP group Chemical group 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 229920006227 ethylene-grafted-maleic anhydride Polymers 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000000246 remedial effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G06F17/30575—
Definitions
- Data can be an abstract term. In the context of computing environments and systems, data can generally encompass all forms of information storable in a computer readable medium (e.g., memory, hard disk). Data, and in particular, one or more instances of data can also be referred to as data object(s). As is generally known in the art, a data object can, for example, be an actual instance of data, a class, a type, or a particular form of data, and so on.
- the term database can also refer to a collection of data and/or data structures typically stored in a digital form. Data can be stored in a database for various reasons and to serve various entities or “users.” Generally, data stored in the database can be used by one or more of the “database users.”
- a user of a database can, for example, be a person, a database administrator, a computer application designed to interact with a database, etc.
- a very simple database or database system can, for example, be provided on a Personal Computer (PC) by storing data (e.g., contact information) on a Hard Disk and executing a computer program that allows access to the data.
- the executable computer program can be referred to as a database program, or a database management program.
- the executable computer program can, for example, retrieve and display data (e.g., a list of names with their phone numbers) based on a request submitted by a person (e.g., show me the phone numbers of all my friends in Ohio).
- databases are much more complex than the example noted above.
- databases have evolved over the years and are used in various business and organizations (e.g., banks, retail stores, governmental agencies, universities).
- Today, databases can be very complex.
- Some databases can support several users simultaneously and allow them to make very complex queries (e.g., give me the names of all customers under the age of thirty five (35) in Ohio that have bought all the items in a given list of items in the past month and also have bought a ticket for a baseball game and purchased a baseball hat in the past 10 years).
- a Database Manager or a Database Management System (DBMS) is provided for relatively large and/or complex databases.
- DBMS can effectively manage the database or data stored in a database, and serve as an interface for the users of the database.
- a DBMS can be provided as an executable computer program (or software) product as is also known in the art.
- a database can be organized in accordance with a Data Model.
- Some notable Data Models include a Relational Model, an Entity-relationship model, and an Object Model.
- the design and maintenance of a complex database can require highly specialized knowledge and skills by database application programmers, DBMS developers/programmers, database administrators (DBAs), etc.
- DBAs database administrators
- various tools can be provided, either as part of the DBMS or as free-standing (stand-alone) software products. These tools can include specialized Database languages (e.g., Data Description Languages, Data Manipulation Languages, Query Languages). Database languages can be specific to one data model or to one DBMS type.
- One widely supported language is Structured Query Language (SQL) developed, by in large, for Relational Model and can combine the roles of Data Description Language, Data Manipulation language, and a Query Language.
- SQL Structured Query Language
- databases have become prevalent in virtually all aspects of business and personal life. Moreover, usage of various forms of databases is likely to continue to grow even more rapidly and widely across all aspects of commerce, social and personal activities.
- databases and DBMS that manage them can be very large and extremely complex partly in order to support an ever increasing need to store data and analyze data.
- larger databases are used by larger organizations. Larger databases are supported by a relatively large amount of capacity, including computing capacity (e.g., processor and memory) to allow them to perform many tasks and/or complex tasks effectively at the same time (or in parallel).
- computing capacity e.g., processor and memory
- smaller databases systems are also available today and can be used by smaller organizations. In contrast to larger databases, smaller databases can operate with less capacity.
- RDBMS Relational Database Management System
- relational tables also referred to as relations
- rows and columns also referred to as tuples and attributes
- each row represents an occurrence of an entity defined by a table, with an entity, for example, being a person, place, thing, or another object about which the table includes information.
- Markup languages have been developed and used extensively in various applications and different aspects of computing systems and environments, including database systems and environments.
- Markup languages can be considered as a modern system for annotating a document in a way that is syntactically distinguishable from the text.
- XML eXtensible Markup Language
- ANSI SQL/XML extension to the SQL standard that specifies a SQL-based extension for using XML in conjunction with SQL.
- ANSI SQL/XML a XML data type is introduced, as well as several routines, and functions to support manipulation and storage of XML in databases.
- XML schema can be a description of a class of XML document, typically expressed in terms of constraints on the structure and content of documents of that class.”
- An example of an XML schema declaration is depicted in FIG. 1 .
- a schema declaration typically includes one or more document references, in which each document reference could potentially be located on either a local server, remote server, or some server out on the World Wide Web.
- the documents referenced by the schema declaration need to be collected, pre-processed, and assembled in order to generate a self-contained XML schema that they represent.
- the invention relates to computing environments and systems. More particularly, the invention relates to techniques for providing metadata to database systems.
- metadata can be provided to multiple processing units of a database system by using local storages respectively provided for the processing units, such that a local storage is accessible only to its respective processing unit, in accordance with one aspect of the invention.
- processing units can access metadata when needed (e.g., to process a database request at runtime or when the database system is active and processing database requests) without having to access a source external to the database system.
- a processing unit does not even need to access sources of the database system that are external to it (e.g., other processing units) to access metadata as it can effectively use its own copy of the metadata that other processing units cannot access.
- a copy of one or more XML objects can be stored in each one of multiple local storages provided as an Unhashed Dictionary table for each one of multiple processing units.
- a local storage may be divided based on various types of metadata.
- a separate local storage can be designated for a processing unit for each type of metadata it stores.
- metadata e.g., an XML object, XML schema, XSLT stylesheets, XQuery modules
- a database request or command for example, by using a register statement that can be provided in accordance with one embodiment of the invention.
- registered objects can be obtained and provided for display, for example, by using a “list registered” database request or command.
- metadata can be effectively distributed to each one of local storages designated for each one of multiple processing units of a database system by initially designating one of the processing units as a master that effectively broadcasts the metadata to all of the processing units (including itself), a number of operations to be executed by each one of the processing units in order to effectively register with and store a local copy of metadata in each of the local storages respectively designated for each one of the processing units.
- FIG. 1 depicts an example of an XML schema declaration.
- FIG. 2 depicts simple examples of XSLT and XQuery requiring access to documents.
- FIG. 3 depicts a computing environment that includes a metadata providing system (MDPS) for a database (or database system) that include multiple processing units in accordance with one embodiment of the invention.
- MDPS metadata providing system
- FIG. 4 depicts a method for providing data objects to a database system that includes multiple processing units in accordance with one embodiment of the invention.
- FIG. 5 depicts a database node of a database system or a Database Management System (DBMS) in accordance with one embodiment of the invention.
- DBMS Database Management System
- FIG. 6 depicts a method for providing data objects to a database system that includes multiple processing units in one or more database nodes in accordance with another embodiment of the invention.
- a schema declaration can include one or more document references, in which each document reference could potentially be located on either a local server, remote server, or some server out on the World Wide Web.
- these documents must be collected, pre-processed, and assembled in order to generate a self-contained XML schema that they represent.
- a database operation can require access to metadata (e.g., schema declaration depicted in FIG. 1 ) as well as actual or raw data (e.g., XML documents with the schema declaration depicted in FIG. 1 ).
- metadata e.g., schema declaration depicted in FIG. 1
- actual or raw data e.g., XML documents with the schema declaration depicted in FIG. 1
- SQL/XML DML operations that require simultaneous access to both a series of XML documents, stored in the database, as well as the XML schema associated with those documents in order for the operations to complete correctly.
- One issue that has to be solved is how to provide a database with runtime access to metadata (e.g. XML schema) it may need to process data associated with and/or stored in the database (e.g., actual, raw, or structured data provided in a document).
- XSLT can be a declarative, XML-based language used for the transformation of XML documents into other XML documents.
- XQuery can be considered to be a query and a functional programming language that is designed to query collections of XML data.
- FIG. 2 depicts simple examples of XSLT and XQuery requiring access to documents that would be downloaded by a conventional database system from the World Wide Web, thereby potentially exposing the database to “hangs” and OS “crashes,” as well as potential security risks.
- processing units or processing modules that process data
- some database systems are designed using a “share nothing” architecture where processing units (or processing modules) are not to have direct access to data stores of each other, although they may share information, for example, via an internal network. Consequently, each one the processing units processes data its own storage stored as a local copy of the data and may not directly access data stored in another location locally stored and processed only by another processing unit. As a result, it is not desirable at least in some database systems to merely provide one copy of metadata to be shared among multiple processing units.
- metadata can be provided to multiple processing units of a database system by using local storages respectively provided for the processing units, such that a local storage is accessible only to its respective processing unit, in accordance with one aspect of the invention.
- processing units can access metadata when needed (e.g., to process a database request at runtime or when the database system is active and processing database requests) without having to access a source external to the database system.
- a processing unit does not even need to access sources of the database system that are external to it (e.g., other processing units) to access metadata as it can effectively use its own copy of the metadata that other processing units cannot access.
- a copy of one or more XML objects can be stored in each one of multiple local storages provided as an Unhashed Dictionary table for each one of multiple processing units.
- a local storage may be divided based on various types of metadata.
- a separate local storage can be designated for a processing unit for each type of metadata it stores.
- metadata e.g., an XML object, XML schema, XSLT stylesheets, XQuery modules
- a database request or command for example, by using a register statement that can be provided in accordance with one embodiment of the invention.
- registered objects can be obtained and provided for display, for example, by using a “list registered” database request or command.
- metadata can be effectively distributed to each one of local storages designated for each one of multiple processing units of a database system by initially designating one of the processing units as a master that effectively broadcasts the metadata to all of the processing units (including itself), a number of operations to be executed by each one of the processing units in order to effectively register with and store a local copy of metadata in each of the local storages respectively designated for each one of the processing units.
- Other aspects of the invention include retrieval and use metadata stored locally by each one of multiple processing units of a database system.
- FIG. 3 depicts a computing environment 100 that includes a metadata providing system (MDPS) 102 for a database (or database system) 101 that include multiple processing units 101 A and 101 B in accordance with one embodiment of the invention.
- MDPS metadata providing system
- each of the processing units 101 A and 101 B can represent and/or can be effectively provided by one or more physical processors (e.g., Central Processing Units (CPU's)) and/or by one or more virtual processors that can effectively simulate a physical processor.
- CPU's Central Processing Units
- the processing units 101 A and 101 B can, for example, be provided in or as a database node in a single or multi-node database system 101 , wherein each one of the database nodes includes one or more physical processors (not shown) that typically support multiple virtual processing units 101 A and 101 B operable to process data associated with the database system 101 (e.g., read data stored in the database system 101 , write data to the database system 101 , process database requests from the database system 101 to answer a query by providing data).
- processing units 101 A and 10 B can be provided as virtual processor supported by at least one physical processor, as those skilled in the art will readily know and appreciate. It should also be noted that generally data can be stored by the database system 101 .
- each one of the processing units 101 A and 101 B stores at least a portion of an instance of data, for example, one or more rows of a table in a storage associated with the processing units.
- the Data can be stored in local storages 106 A and 106 B respectively provided for the processing units 101 A and 101 B, or in one or more other storages (not shown).
- metadata can be provided in local storages 106 A and 106 B as will be described in greater detail below.
- MDPS 102 is operable to provide metadata for the data associated with the database system 101 .
- the metadata can, for example, be XML data or an XML object (e.g., XML schema) needed to process data stored in a form consistent with one or more XML documents.
- the metadata can include data Geospatial data provided for analysis in applying statistical analysis and other informational techniques to data which has a geographical or geospatial aspect.
- the metadata can include configuration information for configuring various aspects of a data and/or a database system.
- MDPS 102 can, for example, be implemented in hardware and/or software by using one or more hardware and/or software components.
- the MDPS 102 can, for example, be effectively implemented by computer executable code stored on a computer readable medium (not shown) and can be executed by one or more processors (not shown).
- processors can be part of a device (not shown) for example, a computer or computing device.
- the MDPS 102 can also be provided at least in part by the one or more of the processing units 101 A and 101 B.
- one or more MDPS components 102 A and 102 B can be provided as local components instead, or in addition, to the MDPS component 102 that can serve alone, or as a central entity coupled to the local components 102 A and 102 B.
- the MDPS 102 can at least partly be provided by the components 102 A and 102 B provided as components that are respectively local to the processing units 101 A and 101 B.
- the MPDS 102 can also be provided at least partly as one or more components that are independent and/or external to the database system 101 . As shown in FIG. 3 , typically, it is desirable to provide the MDPS 102 as a central component with one or more local components 102 A and 102 B respectively associated with processing nodes 101 A and 101 B.
- the MDPS 102 can obtain (e.g., receive, search and download) one or more objects 104 pertaining to metadata for data associated with the database system 101 (database objects).
- the MDPS 102 can receive as input a database request, or command, that identifies one or more objects as one or more XML schema documents needed for processing one or more XML documents stored by the database system 101 .
- the database request or command can also indicate the location where the XML data can be obtained.
- the MDPS 102 can, for example, be operable to obtain the XML data over the Internet in response to a database command or request that identify them and indicates their location.
- MDPS 102 can at least facilitate their storage in local storages 106 A and 106 B, respectively provided only for access by the processing unit 101 A and processing unit 101 B.
- a copy of the database objects 104 can be stored for use by each one of the processing units 101 A and 101 B in their respective local storages 106 and 106 B, each designated for access only by their own respective processing unit.
- the processing units 101 A can obtain the data objects 104 by accessing its own local storage 106 A but would not be able to access the copy of the database objects 104 in the local storage 106 B provided for the processing units 101 B.
- the database node 101 B can obtain the data objects 104 by accessing its own local storage 106 B but would not be able to access the copy of the database objects 104 in the local storage 106 A provided for the processing units 101 A.
- neither one of the database nodes 101 A and 101 B need to make an external access in order to obtain the data objects 104 since it is available from their respective local storage ( 106 A and 106 B).
- a client-side Host 1004 e.g., a Personal Computer (PC), a server
- PC Personal Computer
- a server can, be used to logon to the database system 1000 provided as a Teradata DBS server.
- Commination between the client-side Host 1004 and the database system 1000 can be facilitated by a database communicating mechanism, for example, by an ANSI CLI (Call Level Interface) standard that can include parcel requests and responses that facilitate the movement of files resident on the client-side host 1004 over to the database system 1000 .
- ANSI CLI Common Level Interface
- the rows 1125 1-z can be distributed across the data-storage facilities 1120 1-N by the parsing engine 1130 in accordance with their primary index.
- the primary index defines the columns of the rows that are used for calculating a hash value.
- the function that produces the hash value from the values in the columns specified by the primary index may be called the hash function.
- Some portion, possibly the entirety, of the hash value can be designated a “hash bucket”.
- the hash buckets can be assigned to data-storage facilities 1120 1-N and associated processing units 1110 1-N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
- a central metadata providing system component (MDPS) 1002 can be provided for the database node 1105 1 .
- each one of the processing units 1110 1-N can be effectively provided with a local MDPS component 1002 1-N .
- a database request or command 1006 can be provided by the client-side host 1004 and received by the parsing engine 1005 .
- the database request or command 1006 can be indicative of the metadata associated with the database system 1000 .
- the database request or command 1006 can be a request for registering metadata with the database system 1000 .
- a DDL statement (“REGISTER: XML SCHEMA
- a list registered data objects can be provided and used the database request or command 1006 in accordance with one embodiment of the invention.
- a DDL statement can be provided to allow a SQL user to list a specified type of registered XML objects and/or to display one particular type of a registered XML object (e.g., XQuery modules).
- an internal representation e.g., a Parse Tree
- a DDL statement for registering or listing registered database objects can, for example, be stored as a dedicated parse tree in the parsing engine 1130 as it internal representation.
- work associated with solving a database request 1004 e.g., a DDL query, a DML query
- a “REGISTER DDL” statement internally represented as a new dedicated Parse Tree by the parsing engine 1130 will have at least one dedicated new work step associated with the registration operation: “the registration work step.”
- a “LIST REGISTERED DDL” statement will have at least one work step, namely its own work step: “the repository show step.”
- the repository show step For each type of metadata there can be a special, separate storage (or repository).
- XML objects including XML schema, XSLT stylesheet and XQuery modules, there can be a designated repository provided as an Unhashed Dictionary table, present on each and every processing units 1110 1-N .
- registering metadata can, for example, be done via a register statement provided as a database command, request, or statement 1006 .
- a user of the database system 1000 can place all relevant files on the client-side host 1004 and issue a registration command, request, or statement 1006 for the purpose of registering a particular XML object.
- This registration command, request, or statement 1006 can, for example, be of the form:
- the parsing engine 1130 can parse the statement above into its associated dedicated registration parse tree.
- computer code provided in a kernel of an operating system operating on the database node 1105 1 of the database system 1000 can effectively parse the statement, as those skilled in the art will readily know.
- a dedicated parse tree can eventually be turned into a dedicated registration work step by the parsing engine 1130 .
- the registration work step can be broadcasted to all of the processing units (e.g., AMPs) 1110 1-N .
- the registration work step can be then processed as described below in an exemplary process which uses XML in view of its current prevalence as an example of metadata that can be registered with a database system in accordance with one embodiment of the invention.
- a single one of the processing units 1110 1-N can behave as a master with respect to other processing units and itself.
- the master processing unit can, for example, use the underlying database mechanism (e.g., CLI) to bring all the files that are to be assembled into a self-contained XML object.
- the master processing unit can then process the files and assemble the XML object, for example, into a self-contained XML object. Thereafter, the master processing unit can broadcast the object (e.g., a self-contained XML object) to all other processing units, including itself (the master processing unit). Consequently, all of the processing units can behave as slaves in this context with respect to the broadcast made by the master processing unit.
- all of the processing units can store a copy of a common XML object into their local storages 1200 1-N , for example provided as unhashed dictionary repository.
- the local storages 1120 1-N can at least be conceptually separate from the storages 1120 1-N , or provided as designated locations of the storages 1120 1-N .
- all of the processing units 1110 1-N can have an identical XML object 1200 1-N in their local repositories.
- the last processing unit 1110 1-N to complete the storing can send an operation complete message back to the parsing engine 1130 to effectively signal the completion of the registration operation.
- any operation e.g., a DML that contains an XML operation requiring access to the XML object
- any operation can be accommodated simply by modifying its work steps to reference a specific one of the local repositories 1200 1-N of a particular processing unit 1110 1-N in order to access the registered object.
- a list or display of the registered objects can be accomplished by sending its work step to one of the processing units 1200 1-N that can, for example be randomly selected.
- a list or display of the registered objects for XML objects can be provided as:
- a read table lock can be placed on a local storage 1200 i (shown in FIG. 4 ) provided as an unhashed dictionary table, every time a DML XML operation, or a display object DDL statement, is issued that may require access to the unhashed dictionary table.
- a write table lock can be placed on the unhashed dictionary table, every time a XML registration DDL statement is issued. If a write lock finds that there are currently readers associated with the unhashed table, it can merely block and wait for all the reader to finish.
- FIG. 6 depicts a method 600 for providing data objects to a database system that includes multiple processing units in one or more database nodes in accordance with another embodiment of the invention.
- the each of the database nodes can include at least one processor operable to process at least one portion of data for the database system.
- at least one of the database nodes can include first and second processing units and first and second local storages that can be accessed only by the first and second processing units, respectively.
- Method 600 can, for example, be used by the database system 101 (shown in FIG. 3 ).
- method 600 can wait for a determination ( 602 ) that one or more objects are to be registered.
- a database request can be made to register one or more database objects.
- the database request can provide the database objects and/or a reference to their location.
- one or more database requests identifying one or more database and/or the database objects themselves i.e., content of database object(s) can be obtained ( 604 ) for registration.
- one or more tasks or work steps for registering the database object(s) can be determined ( 606 ) for multiple processing nodes and transmitted ( 606 ) to one of the processing units that can, for example, be selected at random, to serve as a master processing units with respect to other processing unit(s) and the determined ( 606 ) tasks or step for registering the database object(s).
- the one or more tasks or work steps can be obtained ( 608 ) by the processing unit selected to serve as a master (master processing unit).
- the master processing unit can then broadcast ( 610 ) the task(s) or work step(s) to all processing units, including the master processing unit itself.
- the processing units can be part of a database node of a multi-node database system.
- each one of the processing units can store ( 612 ) the perform the task(s) or work step(s) for registration of the database object(s) to effectively store one or more database objects in their local storage that is not accessible to another processing unit.
- the last processing unit determined ( 614 ) to be the last one to complete the task(s) or work step(s) for registration and storage of the one or more database object(s) can send back a completion indication so that it can be determined that the registration has been successfully completed before the method 600 ends.
- implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
- the computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter affecting a machine-readable propagated signal, or a combination of one or more of them.
- data processing apparatus encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
- the apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.
- a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program does not necessarily correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subprograms, or portions of code).
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
- the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read only memory or a random access memory or both.
- the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few.
- Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CDROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, tactile or near-tactile input.
- Implementations of the subject matter described in this specification can be implemented in a computing system that includes a backend component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a frontend component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described is this specification, or any combination of one or more such backend, middleware, or frontend components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g.; a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Abstract
Description
- This application is a continuing application of U.S. patent application Ser. No. 13/688,767, by Gregory Howard Milby, Guofang Li, Kevin Dean Virgil, and Michael Leon Reed, entitled “PROVIDING METADATA TO DATABASE SYSTEMS AND ENVIRONMENTS WITH MULTIPLE PROCESSING UNITS OR MODULES,” filed on Nov. 29, 2012, which is hereby incorporated by reference herein in its entirety and for all purposes.
- Data can be an abstract term. In the context of computing environments and systems, data can generally encompass all forms of information storable in a computer readable medium (e.g., memory, hard disk). Data, and in particular, one or more instances of data can also be referred to as data object(s). As is generally known in the art, a data object can, for example, be an actual instance of data, a class, a type, or a particular form of data, and so on.
- The term database can also refer to a collection of data and/or data structures typically stored in a digital form. Data can be stored in a database for various reasons and to serve various entities or “users.” Generally, data stored in the database can be used by one or more of the “database users.” A user of a database can, for example, be a person, a database administrator, a computer application designed to interact with a database, etc. A very simple database or database system can, for example, be provided on a Personal Computer (PC) by storing data (e.g., contact information) on a Hard Disk and executing a computer program that allows access to the data. The executable computer program can be referred to as a database program, or a database management program. The executable computer program can, for example, retrieve and display data (e.g., a list of names with their phone numbers) based on a request submitted by a person (e.g., show me the phone numbers of all my friends in Ohio).
- Generally, database systems are much more complex than the example noted above. In addition, databases have evolved over the years and are used in various business and organizations (e.g., banks, retail stores, governmental agencies, universities). Today, databases can be very complex. Some databases can support several users simultaneously and allow them to make very complex queries (e.g., give me the names of all customers under the age of thirty five (35) in Ohio that have bought all the items in a given list of items in the past month and also have bought a ticket for a baseball game and purchased a baseball hat in the past 10 years).
- Typically, a Database Manager (DBM) or a Database Management System (DBMS) is provided for relatively large and/or complex databases. As known in the art, a DBMS can effectively manage the database or data stored in a database, and serve as an interface for the users of the database. For example, a DBMS can be provided as an executable computer program (or software) product as is also known in the art.
- It should also be noted that a database can be organized in accordance with a Data Model. Some notable Data Models include a Relational Model, an Entity-relationship model, and an Object Model. The design and maintenance of a complex database can require highly specialized knowledge and skills by database application programmers, DBMS developers/programmers, database administrators (DBAs), etc. To assist in design and maintenance of a complex database, various tools can be provided, either as part of the DBMS or as free-standing (stand-alone) software products. These tools can include specialized Database languages (e.g., Data Description Languages, Data Manipulation Languages, Query Languages). Database languages can be specific to one data model or to one DBMS type. One widely supported language is Structured Query Language (SQL) developed, by in large, for Relational Model and can combine the roles of Data Description Language, Data Manipulation language, and a Query Language.
- Today, databases have become prevalent in virtually all aspects of business and personal life. Moreover, usage of various forms of databases is likely to continue to grow even more rapidly and widely across all aspects of commerce, social and personal activities. Generally, databases and DBMS that manage them can be very large and extremely complex partly in order to support an ever increasing need to store data and analyze data. Typically, larger databases are used by larger organizations. Larger databases are supported by a relatively large amount of capacity, including computing capacity (e.g., processor and memory) to allow them to perform many tasks and/or complex tasks effectively at the same time (or in parallel). On the other hand, smaller databases systems are also available today and can be used by smaller organizations. In contrast to larger databases, smaller databases can operate with less capacity.
- A current popular type of database is the relational database with a Relational Database Management System (RDBMS), which can include relational tables (also referred to as relations) made up of rows and columns (also referred to as tuples and attributes). In a relational database, each row represents an occurrence of an entity defined by a table, with an entity, for example, being a person, place, thing, or another object about which the table includes information.
- Recently, markup languages have been developed and used extensively in various applications and different aspects of computing systems and environments, including database systems and environments. Generally, Markup languages can be considered as a modern system for annotating a document in a way that is syntactically distinguishable from the text.
- More recently, a particular type of a markup language, namely, eXtensible Markup Language (XML) has been developed as a text-based format that can represent structured information. XML can be widely used for the representation of arbitrary data structures. As such, there is a general desire to store XML documents in databases and database systems.
- In addition, yet another very recent development is the ANSI SQL/XML, extension to the SQL standard that specifies a SQL-based extension for using XML in conjunction with SQL. In ANSI SQL/XML, a XML data type is introduced, as well as several routines, and functions to support manipulation and storage of XML in databases.
- Use of XML ANSI SQL/XML can require a fundamental building block for XML technology, namely “XML Schema.” An XML schema can be a description of a class of XML document, typically expressed in terms of constraints on the structure and content of documents of that class.” An example of an XML schema declaration is depicted in
FIG. 1 . - Referring to
FIG. 1 , typically, a schema declaration includes one or more document references, in which each document reference could potentially be located on either a local server, remote server, or some server out on the World Wide Web. Typically, the documents referenced by the schema declaration need to be collected, pre-processed, and assembled in order to generate a self-contained XML schema that they represent. - In view of the foregoing, it should be noted that techniques for using XML in database systems and environments are useful.
- Broadly speaking, the invention relates to computing environments and systems. More particularly, the invention relates to techniques for providing metadata to database systems.
- In accordance with one aspect of the invention, metadata can be provided to multiple processing units of a database system by using local storages respectively provided for the processing units, such that a local storage is accessible only to its respective processing unit, in accordance with one aspect of the invention. As a result, processing units can access metadata when needed (e.g., to process a database request at runtime or when the database system is active and processing database requests) without having to access a source external to the database system. In fact, a processing unit does not even need to access sources of the database system that are external to it (e.g., other processing units) to access metadata as it can effectively use its own copy of the metadata that other processing units cannot access. By way of example, a copy of one or more XML objects can be stored in each one of multiple local storages provided as an Unhashed Dictionary table for each one of multiple processing units. In the example, if desired, a local storage may be divided based on various types of metadata. Alternatively, in effect, a separate local storage can be designated for a processing unit for each type of metadata it stores.
- In accordance with another aspect of the invention, metadata (e.g., an XML object, XML schema, XSLT stylesheets, XQuery modules) can be provided using a database request or command, for example, by using a register statement that can be provided in accordance with one embodiment of the invention. In addition, registered objects can be obtained and provided for display, for example, by using a “list registered” database request or command.
- In accordance with yet another aspect of the invention, metadata can be effectively distributed to each one of local storages designated for each one of multiple processing units of a database system by initially designating one of the processing units as a master that effectively broadcasts the metadata to all of the processing units (including itself), a number of operations to be executed by each one of the processing units in order to effectively register with and store a local copy of metadata in each of the local storages respectively designated for each one of the processing units.
- Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
- The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
-
FIG. 1 depicts an example of an XML schema declaration. -
FIG. 2 depicts simple examples of XSLT and XQuery requiring access to documents. -
FIG. 3 depicts a computing environment that includes a metadata providing system (MDPS) for a database (or database system) that include multiple processing units in accordance with one embodiment of the invention. -
FIG. 4 depicts a method for providing data objects to a database system that includes multiple processing units in accordance with one embodiment of the invention. -
FIG. 5 depicts a database node of a database system or a Database Management System (DBMS) in accordance with one embodiment of the invention. -
FIG. 6 depicts a method for providing data objects to a database system that includes multiple processing units in one or more database nodes in accordance with another embodiment of the invention. - As noted in the background section, techniques for using XML in database systems and environments are useful. As shown in
FIG. 1 , a schema declaration can include one or more document references, in which each document reference could potentially be located on either a local server, remote server, or some server out on the World Wide Web. Typically, these documents must be collected, pre-processed, and assembled in order to generate a self-contained XML schema that they represent. - Generally, a database operation can require access to metadata (e.g., schema declaration depicted in
FIG. 1 ) as well as actual or raw data (e.g., XML documents with the schema declaration depicted inFIG. 1 ). By way of example, there are several SQL/XML DML operations that require simultaneous access to both a series of XML documents, stored in the database, as well as the XML schema associated with those documents in order for the operations to complete correctly. One issue that has to be solved is how to provide a database with runtime access to metadata (e.g. XML schema) it may need to process data associated with and/or stored in the database (e.g., actual, raw, or structured data provided in a document). - However, it is not desirable to allow the internal components or tasks that implement a database to access the sources external to the database (external database sources) in order to obtain the metadata needed to process data associated and/or stored by the database. As such, for example, permitting the internal tasks that implement the database to access, either a remote server or web-documents, at DML runtime could be a problematic and not a desirable solution as it could subject the internal kernel (or central, non-user space) of the Operating System (OS) of the database to potential hangs and OS crashes that would be out of the kernel ability to control.
- This problem also applies to various forms of metadata, including, for example, XSLT stylesheets and XQuery modules, just to name a couple of examples pertaining to XML or XML objects. To elaborate, XSLT can be a declarative, XML-based language used for the transformation of XML documents into other XML documents. XQuery can be considered to be a query and a functional programming language that is designed to query collections of XML data.
FIG. 2 depicts simple examples of XSLT and XQuery requiring access to documents that would be downloaded by a conventional database system from the World Wide Web, thereby potentially exposing the database to “hangs” and OS “crashes,” as well as potential security risks. - Another consideration is that in some database systems it is not desirable and/or permissible for processing units (or processing modules that process data) to have a common access to data. In other words, it is not desirable and/or permissible to allow processing units to read from or write to the same storage location. In fact, as those skilled in the art will readily appreciate, some database systems are designed using a “share nothing” architecture where processing units (or processing modules) are not to have direct access to data stores of each other, although they may share information, for example, via an internal network. Consequently, each one the processing units processes data its own storage stored as a local copy of the data and may not directly access data stored in another location locally stored and processed only by another processing unit. As a result, it is not desirable at least in some database systems to merely provide one copy of metadata to be shared among multiple processing units.
- In view of the foregoing, improved techniques for providing metadata to database systems, especially, database systems with processing units that are not to directly share data stores, are needed and would be very useful.
- Accordingly, it will be appreciated that metadata can be provided to multiple processing units of a database system by using local storages respectively provided for the processing units, such that a local storage is accessible only to its respective processing unit, in accordance with one aspect of the invention. As a result, processing units can access metadata when needed (e.g., to process a database request at runtime or when the database system is active and processing database requests) without having to access a source external to the database system. In fact, a processing unit does not even need to access sources of the database system that are external to it (e.g., other processing units) to access metadata as it can effectively use its own copy of the metadata that other processing units cannot access. By way of example, a copy of one or more XML objects can be stored in each one of multiple local storages provided as an Unhashed Dictionary table for each one of multiple processing units. In the example, if desired, a local storage may be divided based on various types of metadata. Alternatively, in effect, a separate local storage can be designated for a processing unit for each type of metadata it stores.
- In accordance with another aspect of the invention, metadata (e.g., an XML object, XML schema, XSLT stylesheets, XQuery modules) can be provided using a database request or command, for example, by using a register statement that can be provided in accordance with one embodiment of the invention. In addition, registered objects can be obtained and provided for display, for example, by using a “list registered” database request or command.
- In accordance with yet another aspect of the invention, metadata can be effectively distributed to each one of local storages designated for each one of multiple processing units of a database system by initially designating one of the processing units as a master that effectively broadcasts the metadata to all of the processing units (including itself), a number of operations to be executed by each one of the processing units in order to effectively register with and store a local copy of metadata in each of the local storages respectively designated for each one of the processing units.
- Other aspects of the invention include retrieval and use metadata stored locally by each one of multiple processing units of a database system.
- Embodiments of these aspects of the invention are also discussed below with reference to
FIGS. 3-6 . However, those skilled in the art will readily appreciate that the detailed description given herein with respect to these figures is for explanatory purposes as the invention extends beyond these limited embodiments. -
FIG. 3 depicts acomputing environment 100 that includes a metadata providing system (MDPS) 102 for a database (or database system) 101 that includemultiple processing units 101A and 101B in accordance with one embodiment of the invention. As will be discussed in greater detail below, each of theprocessing units 101A and 101B can represent and/or can be effectively provided by one or more physical processors (e.g., Central Processing Units (CPU's)) and/or by one or more virtual processors that can effectively simulate a physical processor. Theprocessing units 101A and 101B can, for example, be provided in or as a database node in a single or multi-node database system 101, wherein each one of the database nodes includes one or more physical processors (not shown) that typically support multiplevirtual processing units 101A and 101B operable to process data associated with the database system 101 (e.g., read data stored in the database system 101, write data to the database system 101, process database requests from the database system 101 to answer a query by providing data). Typically,processing units 101A and 10B can be provided as virtual processor supported by at least one physical processor, as those skilled in the art will readily know and appreciate. It should also be noted that generally data can be stored by the database system 101. Typically, each one of theprocessing units 101A and 101B stores at least a portion of an instance of data, for example, one or more rows of a table in a storage associated with the processing units. For example, the Data can be stored inlocal storages processing units 101A and 101B, or in one or more other storages (not shown). However, it should be noted that metadata can be provided inlocal storages - Referring to
FIG. 3 ,MDPS 102 is operable to provide metadata for the data associated with the database system 101. The metadata can, for example, be XML data or an XML object (e.g., XML schema) needed to process data stored in a form consistent with one or more XML documents. As another example, the metadata can include data Geospatial data provided for analysis in applying statistical analysis and other informational techniques to data which has a geographical or geospatial aspect. As yet another example, the metadata can include configuration information for configuring various aspects of a data and/or a database system. - It will be appreciated that
MDPS 102 can, for example, be implemented in hardware and/or software by using one or more hardware and/or software components. A such, theMDPS 102 can, for example, be effectively implemented by computer executable code stored on a computer readable medium (not shown) and can be executed by one or more processors (not shown). Those skilled in the art will readily appreciate that the processors can be part of a device (not shown) for example, a computer or computing device. - Those skilled in the art will also readily know and appreciate that the
MDPS 102 can also be provided at least in part by the one or more of theprocessing units 101A and 101B. In other words, one ormore MDPS components MDPS component 102 that can serve alone, or as a central entity coupled to thelocal components MDPS 102 can at least partly be provided by thecomponents processing units 101A and 101B. It should also be noted that theMPDS 102 can also be provided at least partly as one or more components that are independent and/or external to the database system 101. As shown inFIG. 3 , typically, it is desirable to provide theMDPS 102 as a central component with one or morelocal components processing nodes 101A and 101B. - In any case, referring to
FIG. 3 , conceptually, theMDPS 102 can obtain (e.g., receive, search and download) one ormore objects 104 pertaining to metadata for data associated with the database system 101 (database objects). By way of example, theMDPS 102 can receive as input a database request, or command, that identifies one or more objects as one or more XML schema documents needed for processing one or more XML documents stored by the database system 101. In the example, the database request or command can also indicate the location where the XML data can be obtained. As such, theMDPS 102 can, for example, be operable to obtain the XML data over the Internet in response to a database command or request that identify them and indicates their location. - Generally, however, after obtaining one or more database objects 104 pertaining to metadata,
MDPS 102 can at least facilitate their storage inlocal storages processing unit 101A and processing unit 101B. In other words, a copy of the database objects 104 can be stored for use by each one of theprocessing units 101A and 101B in their respectivelocal storages 106 and 106B, each designated for access only by their own respective processing unit. As a result, theprocessing units 101A can obtain the data objects 104 by accessing its ownlocal storage 106A but would not be able to access the copy of the database objects 104 in thelocal storage 106B provided for the processing units 101B. Similarly, the database node 101B can obtain the data objects 104 by accessing its ownlocal storage 106B but would not be able to access the copy of the database objects 104 in thelocal storage 106A provided for theprocessing units 101A. In addition, neither one of thedatabase nodes 101A and 101B need to make an external access in order to obtain the data objects 104 since it is available from their respective local storage (106A and 106B). - For example, a client-side Host 1004 (e.g., a Personal Computer (PC), a server) can, be used to logon to the
database system 1000 provided as a Teradata DBS server. Commination between the client-side Host 1004 and thedatabase system 1000 can be facilitated by a database communicating mechanism, for example, by an ANSI CLI (Call Level Interface) standard that can include parcel requests and responses that facilitate the movement of files resident on the client-side host 1004 over to thedatabase system 1000. - For example, the rows 1125 1-z can be distributed across the data-storage facilities 1120 1-N by the
parsing engine 1130 in accordance with their primary index. The primary index defines the columns of the rows that are used for calculating a hash value. The function that produces the hash value from the values in the columns specified by the primary index may be called the hash function. Some portion, possibly the entirety, of the hash value can be designated a “hash bucket”. As such, the hash buckets can be assigned to data-storage facilities 1120 1-N and associated processing units 1110 1-N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed. - Referring again to
FIG. 5 , it should be noted that a central metadata providing system component (MDPS) 1002 can be provided for the database node 1105 1. In addition, each one of the processing units 1110 1-N can be effectively provided with alocal MDPS component 1002 1-N. - A database request or
command 1006 can be provided by the client-side host 1004 and received by the parsing engine 1005. The database request orcommand 1006 can be indicative of the metadata associated with thedatabase system 1000. In accordance with one embodiment, the database request orcommand 1006 can be a request for registering metadata with thedatabase system 1000. - By way of example, a DDL statement: (“REGISTER: XML SCHEMA|XSLT STYLESHEET|XQUERY MODULE) can be provided to facilitate registration of XML objects including one or more of: an XML schema, an XSLT Stylesheet, and an XQuery module. Similarly, a list registered data objects can be provided and used the database request or
command 1006 in accordance with one embodiment of the invention. By way of example, a DDL statement can be provided to allow a SQL user to list a specified type of registered XML objects and/or to display one particular type of a registered XML object (e.g., XQuery modules). - Typically, an internal representation (e.g., a Parse Tree) of a database request or
command 1006 can be stored and used by theparsing engine 1130. As such, a DDL statement for registering or listing registered database objects can, for example, be stored as a dedicated parse tree in theparsing engine 1130 as it internal representation. Generally, work associated with solving a database request 1004 (e.g., a DDL query, a DML query) can be provided by theparsing engine 1130 as a series of work steps. As such, for example, a “REGISTER DDL” statement, internally represented as a new dedicated Parse Tree by theparsing engine 1130 will have at least one dedicated new work step associated with the registration operation: “the registration work step.” Similarly, a “LIST REGISTERED DDL” statement will have at least one work step, namely its own work step: “the repository show step.” It should be noted that for each type of metadata there can be a special, separate storage (or repository). For example, for each type of XML objects including XML schema, XSLT stylesheet and XQuery modules, there can be a designated repository provided as an Unhashed Dictionary table, present on each and every processing units 1110 1-N. - As noted above, registering metadata can, for example, be done via a register statement provided as a database command, request, or
statement 1006. By way of example, a user of thedatabase system 1000 can place all relevant files on the client-side host 1004 and issue a registration command, request, orstatement 1006 for the purpose of registering a particular XML object. This registration command, request, orstatement 1006 can, for example, be of the form: -
REGISTER <XML SCHEMA|XSLT STYLESHEET|XQUERY MODULE> <object_name> <object_value> [referred_doc_location_1 referred_doc_content_1, referred_doc_location_2 referred_doc_content_2 , ...] - Where:
-
- “object_name” is an in-database identifier for the to-be-registered object.
- “object_value”: is content of the object.
- “referred_doc_location_n”: is a location of the document that can, for example, be used to identify itself in the assembled object.
- “referred_doc_content_n”: is the content of the document.
- The
parsing engine 1130 can parse the statement above into its associated dedicated registration parse tree. For example, computer code provided in a kernel of an operating system operating on the database node 1105 1 of thedatabase system 1000 can effectively parse the statement, as those skilled in the art will readily know. In addition, a dedicated parse tree can eventually be turned into a dedicated registration work step by theparsing engine 1130. The registration work step can be broadcasted to all of the processing units (e.g., AMPs) 1110 1-N. - The registration work step can be then processed as described below in an exemplary process which uses XML in view of its current prevalence as an example of metadata that can be registered with a database system in accordance with one embodiment of the invention.
- Specifically, referring to
FIG. 5 , a single one of the processing units 1110 1-N can behave as a master with respect to other processing units and itself. The master processing unit can, for example, use the underlying database mechanism (e.g., CLI) to bring all the files that are to be assembled into a self-contained XML object. The master processing unit can then process the files and assemble the XML object, for example, into a self-contained XML object. Thereafter, the master processing unit can broadcast the object (e.g., a self-contained XML object) to all other processing units, including itself (the master processing unit). Consequently, all of the processing units can behave as slaves in this context with respect to the broadcast made by the master processing unit. In response to the broadcast, all of the processing units can store a copy of a common XML object into their local storages 1200 1-N, for example provided as unhashed dictionary repository. As those skilled in the art will readily appreciate, the local storages 1120 1-N can at least be conceptually separate from the storages 1120 1-N, or provided as designated locations of the storages 1120 1-N. - In any case, as a result of the broadcast, all of the processing units 1110 1-N can have an identical XML object 1200 1-N in their local repositories. The last processing unit 1110 1-N to complete the storing can send an operation complete message back to the
parsing engine 1130 to effectively signal the completion of the registration operation. - Although the foregoing used XML objects as an example, it will readily be appreciated that the techniques noted above can be used to register virtually any type of metadata (e.g., Geospatial data, configuration data) as the techniques do not place any constraints on the type of metadata that can be registered with a database system.
- Furthermore, it should be noted that when an object has been registered with the
database system 1000, any operation (e.g., a DML that contains an XML operation requiring access to the XML object) can be accommodated simply by modifying its work steps to reference a specific one of the local repositories 1200 1-N of a particular processing unit 1110 1-N in order to access the registered object. - Also, a list or display of the registered objects can be accomplished by sending its work step to one of the processing units 1200 1-N that can, for example be randomly selected. By way of example, a list or display of the registered objects for XML objects can be provided as:
- SHOW REGISTERED<XML SCHEMA|XSLT STYLESHEET|XQUERY MODULE>[object_name], where “object_name” is the registered object name which is specified when registering the object. However, if the “object_name” is absent, all registered XML object names for the specified type can be listed. Otherwise, the specified XML object value can be returned. Since each registered XML object may consist of multiple documents, content of all of them will be displayed. It should also be noted that data consistency between object registrations, listing, displaying and operations requiring access to the repositories can, for example, be achieved by inserting an appropriate lock work step(s) in the operational flow (e.g., in DDL and XML DML operational flows). For example, a read table lock can be placed on a local storage 1200 i (shown in
FIG. 4 ) provided as an unhashed dictionary table, every time a DML XML operation, or a display object DDL statement, is issued that may require access to the unhashed dictionary table. Similarly, a write table lock can be placed on the unhashed dictionary table, every time a XML registration DDL statement is issued. If a write lock finds that there are currently readers associated with the unhashed table, it can merely block and wait for all the reader to finish. - To elaborate still further,
FIG. 6 depicts amethod 600 for providing data objects to a database system that includes multiple processing units in one or more database nodes in accordance with another embodiment of the invention. It should be noted that the each of the database nodes can include at least one processor operable to process at least one portion of data for the database system. Also, at least one of the database nodes can include first and second processing units and first and second local storages that can be accessed only by the first and second processing units, respectively.Method 600 can, for example, be used by the database system 101 (shown inFIG. 3 ). - Referring to
FIG. 6 , initially, it is determined (602) whether to register one or more objects. In effect,method 600 can wait for a determination (602) that one or more objects are to be registered. Typically, a database request can be made to register one or more database objects. The database request can provide the database objects and/or a reference to their location. As such, if it is determined (602) that one or more objects are to be registered, one or more database requests identifying one or more database and/or the database objects themselves (i.e., content of database object(s)) can be obtained (604) for registration. Next, one or more tasks or work steps for registering the database object(s) can be determined (606) for multiple processing nodes and transmitted (606) to one of the processing units that can, for example, be selected at random, to serve as a master processing units with respect to other processing unit(s) and the determined (606) tasks or step for registering the database object(s). - As suggested by
FIG. 6 , the one or more tasks or work steps can be obtained (608) by the processing unit selected to serve as a master (master processing unit). The master processing unit can then broadcast (610) the task(s) or work step(s) to all processing units, including the master processing unit itself. For example, the processing units can be part of a database node of a multi-node database system. In any case, after the task(s) or work step(s) for registration of the database object(s) have been broadcasted, each one of the processing units, including the master processing unit itself, can store (612) the perform the task(s) or work step(s) for registration of the database object(s) to effectively store one or more database objects in their local storage that is not accessible to another processing unit. In effect, the last processing unit determined (614) to be the last one to complete the task(s) or work step(s) for registration and storage of the one or more database object(s) can send back a completion indication so that it can be determined that the registration has been successfully completed before themethod 600 ends. - It will readily be understood that in case of an error, an error message can be used, and the
method 600 may attempt to retry and/or take remedial action to correct a problem, although for the sake of brevity this is not depicted inFIG. 6 . Those skilled in the art will also readily appreciate that a method can be provided to list one or more registered objects in accordance with one or more techniques noted above. - Generally, the various aspects, features, embodiments or implementations of the invention described above can be used alone or in various combinations. Furthermore, implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter affecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.
- A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subprograms, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CDROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, tactile or near-tactile input.
- Implementations of the subject matter described in this specification can be implemented in a computing system that includes a backend component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a frontend component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described is this specification, or any combination of one or more such backend, middleware, or frontend components. The components of the system can be interconnected by any form or medium of digital data communication, e.g.; a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
- The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- While this specification contains many specifics, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features specific to particular implementations of the disclosure. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.
- Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
- The various aspects, features, embodiments or implementations of the invention described above can be used alone or in various combinations. The many features and advantages of the present invention are apparent from the written description and, thus, it is intended by the appended claims to cover all such features and advantages of the invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, the invention should not be limited to the exact construction and operation as illustrated and described. Hence, all suitable modifications and equivalents may be resorted to as falling within the scope of the invention.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/044,654 US20180329970A1 (en) | 2012-11-29 | 2018-07-25 | Providing metadata to database systems and environments with multiple processing units or modules |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/688,767 US10042907B2 (en) | 2012-11-29 | 2012-11-29 | Providing metadata to database systems and environments with multiple processing units or modules |
US16/044,654 US20180329970A1 (en) | 2012-11-29 | 2018-07-25 | Providing metadata to database systems and environments with multiple processing units or modules |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/688,767 Continuation US10042907B2 (en) | 2012-11-29 | 2012-11-29 | Providing metadata to database systems and environments with multiple processing units or modules |
Publications (1)
Publication Number | Publication Date |
---|---|
US20180329970A1 true US20180329970A1 (en) | 2018-11-15 |
Family
ID=50774152
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/688,767 Active 2033-11-15 US10042907B2 (en) | 2012-11-29 | 2012-11-29 | Providing metadata to database systems and environments with multiple processing units or modules |
US16/044,654 Pending US20180329970A1 (en) | 2012-11-29 | 2018-07-25 | Providing metadata to database systems and environments with multiple processing units or modules |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/688,767 Active 2033-11-15 US10042907B2 (en) | 2012-11-29 | 2012-11-29 | Providing metadata to database systems and environments with multiple processing units or modules |
Country Status (1)
Country | Link |
---|---|
US (2) | US10042907B2 (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182328A1 (en) * | 2001-10-29 | 2003-09-25 | Jules Paquette | Apparatus and method for sharing data between multiple, remote sites of a data network |
US20040055002A1 (en) * | 2002-09-17 | 2004-03-18 | International Business Machines Corporation | Application connector parallelism in enterprise application integration systems |
US6742020B1 (en) * | 2000-06-08 | 2004-05-25 | Hewlett-Packard Development Company, L.P. | System and method for managing data flow and measuring service in a storage network |
US20050114291A1 (en) * | 2003-11-25 | 2005-05-26 | International Business Machines Corporation | System, method, and service for federating and optionally migrating a local file system into a distributed file system while preserving local access to existing data |
US7174553B1 (en) * | 2002-11-22 | 2007-02-06 | Ncr Corp. | Increasing parallelism of function evaluation in a database |
US20080270461A1 (en) * | 2007-04-27 | 2008-10-30 | Network Appliance, Inc. | Data containerization for reducing unused space in a file system |
US20100228798A1 (en) * | 2009-02-24 | 2010-09-09 | Hitachi, Ltd. | Geographical distributed storage system based on hierarchical peer to peer architecture |
CN102024022A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Method for copying metadata in distributed file system |
US20110224953A1 (en) * | 2009-12-10 | 2011-09-15 | Accenture Global Services Limited | Energy facility control system |
US20110246652A1 (en) * | 2008-07-24 | 2011-10-06 | Symform, Inc. | Shared community storage network |
US20110307451A1 (en) * | 2010-06-10 | 2011-12-15 | EnduraData, Inc, | System and method for distributed objects storage, management, archival, searching, retrieval and mining in private and public clouds and deep invisible webs |
US20120084524A1 (en) * | 2010-09-30 | 2012-04-05 | Parag Gokhale | Archiving data objects using secondary copies |
US20120324040A1 (en) * | 2011-06-15 | 2012-12-20 | Amazon Technologies, Inc. | Local networked storage linked to remote networked storage system |
US20140047183A1 (en) * | 2012-08-07 | 2014-02-13 | Dell Products L.P. | System and Method for Utilizing a Cache with a Virtual Machine |
US20140068259A1 (en) * | 2012-08-31 | 2014-03-06 | Cleversafe, Inc. | Secure data access in a dispersed storage network |
US20180247074A1 (en) * | 2012-10-12 | 2018-08-30 | Egnyte, Inc. | Systems and methods for facilitating access to private files using a cloud storage system |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2003338A1 (en) * | 1987-11-09 | 1990-06-09 | Richard W. Cutts, Jr. | Synchronization of fault-tolerant computer system having multiple processors |
US6513108B1 (en) * | 1998-06-29 | 2003-01-28 | Cisco Technology, Inc. | Programmable processing engine for efficiently processing transient data |
US6785673B1 (en) * | 2000-02-09 | 2004-08-31 | At&T Corp. | Method for converting relational data into XML |
US6775679B2 (en) * | 2001-03-20 | 2004-08-10 | Emc Corporation | Building a meta file system from file system cells |
US6889309B1 (en) * | 2002-04-15 | 2005-05-03 | Emc Corporation | Method and apparatus for implementing an enterprise virtual storage system |
US7519577B2 (en) * | 2003-06-23 | 2009-04-14 | Microsoft Corporation | Query intermediate language method and system |
US7139772B2 (en) * | 2003-08-01 | 2006-11-21 | Oracle International Corporation | Ownership reassignment in a shared-nothing database system |
US7739577B2 (en) * | 2004-06-03 | 2010-06-15 | Inphase Technologies | Data protection system |
US7660793B2 (en) * | 2006-11-13 | 2010-02-09 | Exegy Incorporated | Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors |
US8229945B2 (en) * | 2008-03-20 | 2012-07-24 | Schooner Information Technology, Inc. | Scalable database management software on a cluster of nodes using a shared-distributed flash memory |
US8694469B2 (en) * | 2009-12-28 | 2014-04-08 | Riverbed Technology, Inc. | Cloud synthetic backups |
US8838624B2 (en) * | 2010-09-24 | 2014-09-16 | Hitachi Data Systems Corporation | System and method for aggregating query results in a fault-tolerant database management system |
-
2012
- 2012-11-29 US US13/688,767 patent/US10042907B2/en active Active
-
2018
- 2018-07-25 US US16/044,654 patent/US20180329970A1/en active Pending
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6742020B1 (en) * | 2000-06-08 | 2004-05-25 | Hewlett-Packard Development Company, L.P. | System and method for managing data flow and measuring service in a storage network |
US20030182328A1 (en) * | 2001-10-29 | 2003-09-25 | Jules Paquette | Apparatus and method for sharing data between multiple, remote sites of a data network |
US20040055002A1 (en) * | 2002-09-17 | 2004-03-18 | International Business Machines Corporation | Application connector parallelism in enterprise application integration systems |
US7174553B1 (en) * | 2002-11-22 | 2007-02-06 | Ncr Corp. | Increasing parallelism of function evaluation in a database |
US20050114291A1 (en) * | 2003-11-25 | 2005-05-26 | International Business Machines Corporation | System, method, and service for federating and optionally migrating a local file system into a distributed file system while preserving local access to existing data |
US20080270461A1 (en) * | 2007-04-27 | 2008-10-30 | Network Appliance, Inc. | Data containerization for reducing unused space in a file system |
US20110246652A1 (en) * | 2008-07-24 | 2011-10-06 | Symform, Inc. | Shared community storage network |
US20100228798A1 (en) * | 2009-02-24 | 2010-09-09 | Hitachi, Ltd. | Geographical distributed storage system based on hierarchical peer to peer architecture |
US20110224953A1 (en) * | 2009-12-10 | 2011-09-15 | Accenture Global Services Limited | Energy facility control system |
US20110307451A1 (en) * | 2010-06-10 | 2011-12-15 | EnduraData, Inc, | System and method for distributed objects storage, management, archival, searching, retrieval and mining in private and public clouds and deep invisible webs |
US20120084524A1 (en) * | 2010-09-30 | 2012-04-05 | Parag Gokhale | Archiving data objects using secondary copies |
CN102024022A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Method for copying metadata in distributed file system |
US20120324040A1 (en) * | 2011-06-15 | 2012-12-20 | Amazon Technologies, Inc. | Local networked storage linked to remote networked storage system |
US20140047183A1 (en) * | 2012-08-07 | 2014-02-13 | Dell Products L.P. | System and Method for Utilizing a Cache with a Virtual Machine |
US20140068259A1 (en) * | 2012-08-31 | 2014-03-06 | Cleversafe, Inc. | Secure data access in a dispersed storage network |
US20180247074A1 (en) * | 2012-10-12 | 2018-08-30 | Egnyte, Inc. | Systems and methods for facilitating access to private files using a cloud storage system |
Also Published As
Publication number | Publication date |
---|---|
US10042907B2 (en) | 2018-08-07 |
US20140149349A1 (en) | 2014-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6617117B2 (en) | Scalable analysis platform for semi-structured data | |
US10275475B2 (en) | Scalable analysis platform for semi-structured data | |
US9946780B2 (en) | Interpreting relational database statements using a virtual multidimensional data model | |
US10445321B2 (en) | Multi-tenant distribution of graph database caches | |
US10528540B2 (en) | Dynamic aggregate generation and updating for high performance querying of large datasets | |
US10467250B2 (en) | Data model design collaboration using semantically correct collaborative objects | |
CN105531698B (en) | Equipment, system and method for batch and real time data processing | |
Indrawan-Santiago | Database research: Are we at a crossroad? Reflection on NoSQL | |
US8180758B1 (en) | Data management system utilizing predicate logic | |
US20240078229A1 (en) | Generating, accessing, and displaying lineage metadata | |
US10628492B2 (en) | Distributed graph database writes | |
US20130262510A1 (en) | Query derived communication mechanism for communication between relational databases and object-based computing environments and systems | |
Holzschuher et al. | Querying a graph database–language selection and performance considerations | |
US20180357329A1 (en) | Supporting tuples in log-based representations of graph databases | |
US10445370B2 (en) | Compound indexes for graph databases | |
Fotache et al. | NoSQL and SQL Databases for Mobile Applications. Case Study: MongoDB versus PostgreSQL. | |
US20210026894A1 (en) | Branch threading in graph databases | |
US20230334046A1 (en) | Obtaining inferences to perform access requests at a non-relational database system | |
Padhy et al. | A quantitative performance analysis between Mongodb and Oracle NoSQL | |
US20180329970A1 (en) | Providing metadata to database systems and environments with multiple processing units or modules | |
Chen | Comparison of graph databases and relational databases when handling large-scale social data | |
US9805121B2 (en) | Management of different database systems and/or environments | |
Vaddeman | Beginning Apache Pig | |
Lathar et al. | Comparison study of different NoSQL and cloud paradigm for better data storage technology | |
Liu | An Analysis of Relational Database and NoSQL Database on an eCommerce Platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |