US20150120697A1 - System and method for analysis of a database proxy - Google Patents

System and method for analysis of a database proxy Download PDF

Info

Publication number
US20150120697A1
US20150120697A1 US14/064,210 US201314064210A US2015120697A1 US 20150120697 A1 US20150120697 A1 US 20150120697A1 US 201314064210 A US201314064210 A US 201314064210A US 2015120697 A1 US2015120697 A1 US 2015120697A1
Authority
US
United States
Prior art keywords
graph
tables
shard
query
vertices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/064,210
Inventor
Noam ASNER
Doron LEVARI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ignite Scalarc Solutions Inc
Original Assignee
ScaleBase Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ScaleBase Inc filed Critical ScaleBase Inc
Priority to US14/064,210 priority Critical patent/US20150120697A1/en
Assigned to SCALEBASE INC. reassignment SCALEBASE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ASNER, NOAM, LEVARI, DORON
Publication of US20150120697A1 publication Critical patent/US20150120697A1/en
Assigned to ScalArc, Inc. reassignment ScalArc, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCALEBASE INC.
Priority to US14/985,277 priority patent/US10496645B1/en
Assigned to IGNITE SCALARC SOLUTIONS, INC. reassignment IGNITE SCALARC SOLUTIONS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ScalArc, Inc.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/30466
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24544Join order optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations

Definitions

  • Databases are used for storing digital data or information. In addition to storing data, modern database systems enable users and applications to search for, and retrieve stored data. Various systems and methods are used in order to improve and provide services. For example, database management systems (DBMSs) are designed to interact with users or applications and provide data storage and retrieval operations.
  • DBMSs database management systems
  • Horizontal partitioning is a database design or principal whereby rows of a table in a database are stored separately.
  • a database shard is a partition in a database.
  • a shard is a horizontal partition in a database.
  • an individual partition in a database may be referred to as a shard or a database shard.
  • shards may be used to partition a large database across a number of servers or partitions.
  • sharding is used in order to distribute or split data over multiple machines.
  • a system and method for analyzing database queries may include determining a set of tables referenced in a query; representing the set of tables as, or by, vertices of a graph; and if the graph is incomplete then determining the query is associated with a shard conflict.
  • a system and method may determine a query is not associated with a shard conflict if and only if the graph is complete.
  • a system and method may include storing a set of tables on at least one shard based on a common key.
  • a system and method may include determining a first and a third vertices are connected if the first and a second vertices on the graph are connected and the second and the third vertices on the graph are connected.
  • a system and method may include distributing a first and second tables over at least two shards based on a common key.
  • a common key used for distributing a first and second tables over at least two shards may be a table column.
  • a system and method may storing a first and second portions of a table on a respective first and second shards based on a respective first and second ranges of values of the common key.
  • a system and method may determine that data from at least two shards is required in order to complete a record in a response for a query if at least one vertex in the graph is not connected to at least one other vertex in the graph.
  • a method for determining a shard conflict may include distributing a plurality of tables across two or more shards according to a common key; receiving a query and determining a set of tables related to the query; representing the set of tables by vertices of a graph; and, if at least on vertex in the graph is not connected to at least one other vertex in the graph then determining the query is associated with a shard conflict.
  • a controller may be configured to determine a set of tables referenced in a query; a controller may be configured to determine which tables in the query are included in a set of relevant tables.
  • a controller may represent the set of tables by vertices of a graph; and if the graph is incomplete then determine the query is associated with a shard conflict.
  • a controller may be adapted to determine a query is not associated with a shard conflict if and only if the graph is complete.
  • FIG. 1 is a high level block diagram of a system according to embodiments of the present invention.
  • FIG. 2 shows distribution of tables across shards as known in the art
  • FIG. 3 shows distribution of tables across shards according to embodiments of the present invention
  • FIG. 4A graphically shows graphs according to embodiments of the present invention.
  • FIG. 4B graphically shows graphs according to embodiments of the present invention.
  • FIG. 4C graphically shows graphs according to embodiments of the present invention.
  • FIG. 4D graphically shows graphs according to embodiments of the present invention.
  • FIG. 5 shows pseudo code according to embodiments of the present invention.
  • FIG. 6 shows high level block diagram of an exemplary computing device according to embodiments of the present invention.
  • the terms “plurality” and “a plurality” as used herein may include, for example, “multiple” or “two or more”.
  • the terms “plurality” or “a plurality” may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like.
  • the term set when used herein may include one or more items.
  • the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.
  • system 100 may include a network 110 that may be, may comprise or may be part of, a private or public internet protocol (IP) network, or the internet, or a combination thereof. Additionally or alternatively, network 110 may be, comprise or be part of, a global system for mobile communications (GSM) network.
  • network 110 may include or comprise an IP network such as the internet, a GSM related network and any equipment for bridging or otherwise connecting such networks as known in the art.
  • system 100 may include a plurality of storage systems.
  • Storage systems 140 and 150 may include or may be any suitable storage systems.
  • storage systems 140 and 150 include one or more hard disk drives, removable or fixed storage units.
  • storage systems 140 and 150 may be, or include, network storage devices and may be geographically distant from one another or from other components of system 100 .
  • Storage systems 140 and 150 may each include a server and/or a management system (e.g., a DBMS).
  • storage system 140 may include shard 145 and storage system 150 may include shard 155 .
  • Shards 145 and 155 may be shards as known in the art, e.g., associated with a partition or server in storage systems 140 and 150 respectively.
  • system 100 may include any number of storage systems that may be similar to storage systems 140 and 150 .
  • system 100 may include a database front end unit 120 , referred to herein as DFEU 120 .
  • a front end or proxy may provide an interface between a database and its clients.
  • a proxy may receive (or intercept) requests sent from a user or application to a database, examine the requests, possibly process the requests and either transfer the requests to the database or generate requests to the database based on received or intercepted requests.
  • a proxy may receive responses from a database, process the responses and send the (possibly processed) responses to clients of the database.
  • DFEU 120 may be a proxy or front end or it may be included in a proxy or front end.
  • DFEU 120 may be any module, unit or component designed and configured to perform functionalities and methods as described herein.
  • DFEU 120 is a computing device similar to computing device 600 including a memory and controller wherein the memory stores instructions that, when executed by the controller, cause the controller to determine whether or not a shard conflict exists.
  • a computing device similar to computing device 600 determines whether or not a shard conflict will be encountered when generating a response for a query.
  • a computing device similar to computing device 600 may perform analysis of join, merge or aggregation operations related to a database as described herein.
  • DFEU 120 may receive requests or queries from users or applications, determine whether or not a shard conflict exists with relation to a request or query, process the requests or queries and interact with storage systems 140 and 150 in order to respond to queries or perform related tasks. For example, a request for data stored in storage systems 140 and 150 may be sent (e.g., from a user or application) to DFEU 120 over network 110 .
  • DFEU 120 may process a request and, based on processing a request, DFEU 120 may interact with storage systems 140 and 150 in order to retrieve requested data or perform other operations.
  • DEFU 120 may send or otherwise provide retrieved or requested data to the requesting entity.
  • system 100 may include, or be connected to, a client computing device 130 .
  • Client computing device 130 may include or may be, for example, a server, a personal computer or a desktop computer, a mobile or laptop computer.
  • client computing device 130 may be a network device, a smartphone or a mobile phone or any other suitable computing device capable of at least communicating with DEFU 120 over network 110 .
  • Client computing device 130 may be any suitable computing device configured or adapted to enable a user or an application to interact with DEFU 120 .
  • a table is split into portions, and portions of the table are stored on more than one shard, then, in order to retrieve data from the table, a plurality of shards may need to be accessed. Accordingly, a system may need to join, aggregate or merge data portions retrieved from a plurality of shards in order to provide the requested data.
  • data from two tables is needed in order to complete a request for data and each of the two tables is stored on a different shard then two shards need to be accessed and data retrieved from the two shards needs to be merged, aggregated or joined.
  • the more shards involved in the process the greater the challenge.
  • FIG. 2 showing a distribution of tables across shards.
  • tables 215 , 216 , 225 and 226 which are tables related to departments and employees in an organization, are stored on shards 210 and 220 .
  • table 215 stored on shard 210 is a department table that includes a column with a department identification number (labeled “Dep. ID”), where an identification number of a department is stored. As shown, table 215 further includes a department name column (labeled “Dep. NAME”) and an additional column (labeled “No. of employees”) for the number of employees in each department. It will be understood that other or additional columns may be included in table 215 .
  • Table 225 stored on shard 220 is similar to table 215 . Other types of tables and other data may be stored.
  • table 216 stored on shard 210 is an employee table.
  • table 216 includes a column for employee identification (labeled “Employee ID”), a column with employee name (labeled “Employee NAME”) and a column that indicates the department to which the employee belongs (labeled “Dep. ID”). For example, as shown, employee Jack belongs to the operations department (Dep. ID “2”).
  • Table 226 stored on shard 220 is similar to table 216 .
  • Queries or requests from a database typically involve some logical relations between entities. For example, a user may request the list (e.g., names of employees) of all employees in the management department. Using the tables and shards shown by FIG. 2 , a system may have to retrieve data from both shards 210 and 220 in order to generate a response. For example, having found the employee named David in table 226 on shard 220 , a system now needs to access table 215 on shard 210 in order to find the name of the department with department ID 1 (the marketing department in the current example).
  • department ID 1 the marketing department in the current example.
  • a shard conflict exists if, for at least one record or other object in a response to a query, data from at least two shards must be retrieved.
  • a response or a result record may need to include an employee with the “Name” of the employee's department name. If the result record can be composed using data in a single shard then, as referred to herein, no shard conflict exists.
  • a cartesian product and possibly filtering of the cartesian product result
  • a system may determine a shard conflict exists. For example, when distributing tables over shards as shown in FIG.
  • a system may need to retrieve both table 215 and table 225 , generate a cartesian product as known in the art to produce a set of products and then filter out all products that do not match a criteria.
  • a cartesian product would include a set that includes all possible combinations of all employees and all departments.
  • the resulting (typically very large) set may subsequently need to be processed in order to filter out irrelevant products or identify relevant ones.
  • a plurality of tables are distributed across two or more shards according to a common key.
  • the key is referred to herein as a common key to indicate the key is common to the plurality of tables. For example, instead of storing tables 215 , 216 , 225 and 226 on shards 210 and 220 as shown in FIG. 2 , a different arrangement may be used.
  • FIG. 3 showing a distribution of tables across shards according to embodiments of the invention.
  • tables 315 , 320 , 335 and 340 tables 215 , 216 , 225 and 226 may be distributed on shards 310 and 330 based on a common (or same) key.
  • shards 310 and 330 may be stored on storage systems 140 and 150 respectively as shown by shards 145 and 155 .
  • the department identification (Dep. ID), key or value may be used in order to split a department table and store the resulting tables on shards 310 and 330 .
  • shard 310 may store a department table 315 that may include departments with a department identification in a specific range, e.g., one to two (1-->2).
  • an employees table 320 in shard 310 may include employees belonging to departments in departments table 315 .
  • employees belonging to departments with an identification key value in the specific range (1-->2) are included in employees table 320 stored on shard 310 .
  • shard 330 includes a departments table 335 that includes departments having a department key value or identification number that is in the range of three to four (3-->4) and shard 330 also includes an employees table 340 that includes employees associated with the respective range of departments. For example and as shown, employees table 340 includes employees that belong to departments having a department identification value in the range of three to four (3-->4).
  • employees and departments are shard tables, sharded by department identification. Any other key (or column in a table) may be used for splitting tables as exemplified by FIG. 3 . Any number of tables may be split and distributed or re-distributed across a plurality of a respective number of shards based on a common key or column as described herein.
  • an organization may maintain an expenses table that lists expenses per departments and per employees.
  • an employees table, departments table and expenses table may be distributed across a number of shards based on a common key (e.g., a department identification). Accordingly, details related to the employees and expenses of a specific department may be stored (and found) in the same shard that stores details of the specific department.
  • an embodiment may include a distribution table or list.
  • DFEU 120 may store and maintain a table or list similar to table 350 on storage 140 .
  • a distribution table or list may include, for each table, a key used for distributing the table and an identification of the table.
  • Other entries in a distribution list may be a key value or range of values, an identification of shards used for storing the table and the like.
  • names of tables are shown under table identification in table 350 , however, it will be understood that any identification may be used, e.g., an internal identification used by DFEU 120 .
  • entries in the key column in table 350 are “Dep. ID”, in accordance with examples described herein, however, it will be understood that any identification of a key may be used, e.g., an identification of keys as generated by DFEU 120 .
  • a distribution list or table maintained by DFEU 120 indicates that a common key to be used is the department identification in the departments table.
  • a distribution table may further indicate that the range of 1 to 2 is stored on shard 310 in tables 315 and 320 (depending on whether it is a department or employee), the range of 3 to 4 is stored on tables 335 and 340 shard 330 and so on.
  • two entries in table 350 may be related to departments of an organization and may include a respective first and second key ranges and a respective first and second shards. Accordingly, when new data is to be added to a table, DFEU 120 may consult a distribution table and determine which table on which shard is to be updated with the new data.
  • DFEU 120 may consult the distribution table in order to find the shard and table for the given key value (4 in this case) and determine that the new employee is to be added to table 340 on shard 330 since these are the table and shard for storing entries using the common key of department identification as described.
  • Embodiments of the invention may examine a query, request or any interaction with a database and determine whether or not a shard conflict is associated with the request, query or interaction.
  • query is used herein to refer to any message or request received, from a client (that may be a user or an application) by a database.
  • DFEU 120 examines a query and determines whether or not the query causes or generates a shard conflict. If a shard conflict is determined, DFEU 120 may perform a number of actions. For example, if a shard conflict is determined, the relevant query may be rejected or it may be passed on to an entity for further prosecution. In an embodiment, if a shard conflict is identified then the query may be sent to all relevant shards, responses from all relevant shards may be collected and a response to the query may be generated by joining, merging or aggregating records from the responses received from the relevant shards.
  • DFEU 120 may split a query into a number of queries which are then executed on a number of shards.
  • executing a query on a shard it is meant performing an operation based on the query with respect to data in the shard. For example, if the query includes searching for a data element then executing the query on a shard includes searching for the element in data stored in the shard.
  • Results from a number of shards may then be aggregated or joined as known in the art. Any system or method for splitting a query (after DFEU 120 determines that a shard conflict exists with respect to the query) may be used without departing from the scope of the invention.
  • a query or request may be executed separately on each relevant shard and results from a plurality of prosecutions on a plurality of shards may be unified to produce a response.
  • DFEU 120 may identify the relevant shard or shards and send the query to the relevant shards for further processing. For example, using entries in a table similar to table 350 described herein, DFEU 120 determines which shard stores the relevant table and sends the query to the shard for prosecution.
  • a request may not cause a shard conflict while still requiring accessing multiple shards. For example, if data is distributed as shown in FIG. 3 , a query for the list of ALL departments may require accessing more than one shard but no aggregation (or joining) of data from multiple shards is required in order to complete (or produce or generate) records in the response.
  • a statement may request data for a specific department and still cause a shard conflict. For example, a query for ALL employees of “Marketing” when distribution is as shown in FIG. 2 will cause a shard conflict since, in order to determine the name of departments for employees found in table 226 on shard 220 , data from table 215 on shard 210 needs to be used.
  • DFEU 120 may examine a query and determine whether or not all tables or other data constructs referenced in the query or relevant to the query are stored on the same shard. In an embodiment, DFEU 120 examines a query and determines which tables in the query are the set of relevant tables by identifying the set of tables which are referenced by the query or which are relevant to the query. For example, DFEU 120 identifies a “FROM” clause in a query. As known in the art, a “FROM” clause in a structured query language (SQL) indicates the tables from which data is requested. Accordingly, DFEU 120 may examine a query and may determine the tables that need to be accessed in order to generate a response for the query. It will be understood that DFEU 120 may be configured to parse any applicable query or request and that any method of identifying the relevant tables in a database may be used by embodiments of the invention.
  • SQL structured query language
  • shard table S i may be table 215 and SK i may be the department identification number or column as described herein.
  • a key is a column number.
  • DFEU 120 may check or determine whether or not the graph is complete. As described herein, in an embodiment, if DFEU 120 determines the graph is complete then DFEU 120 further determines no shard conflict exists with respect to the relevant query. In an embodiment, DFEU 120 determines a query is not associated with a shard conflict if, and only if, the graph is complete.
  • a graph is complete if each vertex on the graph is connected to all other vertices on the graph. As referred to herein, two vertices on a graph are connected if and only if there is an edge between them.
  • DFEU 120 determines whether or not a graph is complete. For example, DFEU 120 determines a graph is complete if there exists a path (comprising one or more edges) from any vertex on the graph to any other vertex on the graph. In an embodiment, DFEU 120 determines the graph is incomplete if, for at least one vertex on the graph, there exists no path of edges to another vertex on the graph.
  • DFEU 120 may determine no shard conflict exists with relation to the query. If the graph is incomplete, that is, at least one vertex on the graph is not connected by a line or edge (or a plurality of edges as described) to at least one other vertex on the graph, then DFEU 120 determines that the graph is incomplete and a shard conflict exists.
  • tables 215 , 216 , 225 and 226 are not distributed by a common or same key and, accordingly, if tables 215 , 216 , 225 and 226 are relevant to a query, then DFEU 120 determines a shard conflict exists (e.g., both shards 220 and 210 need to be accessed).
  • a shard conflict exists (e.g., both shards 220 and 210 need to be accessed).
  • the relevant tables are the employees and department tables
  • DFEU 120 identifies that these two tables are distributed across shards using the same or common key (the department ID as described) and therefore determines no shard conflict exists with respect to the query.
  • FIGS. 4A , 4 B, 4 C and 4 D graphically depict example graphs, vertices and edges according to embodiments of the invention.
  • DFEU 120 may determine a shard conflict exists. For example, as shown by FIG. 4D , vertex S4 is not connected to vertex S2, therefore, the graph shown in FIG. 4D is incomplete. For example, although the keys used for tables S 1 and S 2 are the same, and the keys used for tables S 3 and S 4 are the same, not all vertices are connected because the key used for tables S1 and S2 is not the same as the key used for tables S3 and S4. Accordingly, in the case shown by FIG. 4C DFEU 120 may determine a shard conflict exists. Otherwise described, in this example and as shown by FIG.
  • FIG. 5 showing pseudo code according to embodiments of the invention. Code similar to the pseudo code shown in FIG. 5 may be used by DFEU 120 in order to determine whether or not a shard conflict as described herein is expected when responding to, or otherwise processing, a query. As common in the art, the patterns “//”, “/*” and “*/” are used in FIG. 5 to indicate or denote comments in the code. Other languages and other specific code may be used.
  • each table may be assigned a unique identification or index, accordingly, the pseudo code shown in FIG. 5 is for tables numbered 0 to N ⁇ 1.
  • DFEU 120 when determining which tables are referenced in a query, DFEU 120 also may determine the number of tables and sets “N” to the correct number. In an embodiment, DFEU 120 associates each of the tables identified as related to the query with a unique value or number selected from values of 0 to (N ⁇ 1).
  • Blocks are used in the pseudo code shown in FIG. 5 to represent vertices in a graph, where vertices represent tables as described herein. Any suitable object may be used to implement blocks, e.g., a memory that stores a value.
  • each block represents one or more vertices.
  • each block is a singleton.
  • a singleton is a mathematical expression that describes a group consisting one element, in the present example, initially, a singleton describes a group consisting a single vertex. Accordingly, initially, a graph includes N vertices and no edges since no vertices are yet connected by edges.
  • blocks are orthogonal (e.g., a logical intersection of any two blocks is an empty group) and a unification of all blocks contains or represents all vertices.
  • a first and second tables e.g., table “i” and table “j” as shown
  • the blocks containing or representing the tables are joined by unifying them into one block.
  • unifying, joining or merging a first and second blocks into a third block connects two vertices groups respectively represented by the first and second blocks.
  • the third block now represents the two vertices groups previously represented by the first and second blocks.
  • blocks represent unconnected vertices or parts of the graph. For example, following the creation of the third block as described, no edges are determined between any blocks on the graph, e.g., between the third block and any other blocks representing vertices on the graph. Accordingly, in an embodiment, blocks represent unconnected parts of graph.
  • the pseudo code in FIG. 5 will return one (“1”) that may be interpreted as “TRUE” if only one block remains, indicating that no shard conflict exists with respect to a query related to the N tables identified. As shown, the pseudo code in FIG. 5 will return zero (“0”) that may be interpreted as “FALSE” if more than one block remains, indicating that a shard conflict exists, or is to be expected, with respect to the query.
  • each block is a set of N bits, each bit identified by its index and, in order for a block to contain or represent table “i”, bit “i” in the block is set.
  • bit “i” in the block is set.
  • a new block that has bits “i” and “j” set is created and the new block replaces the two blocks previously used for representing tables “i” and “j”. Accordingly, when identifying two tables distributed using the same key, the number of blocks is reduced by one since a single block is now used to represent both tables.
  • any applicable method or system may be used in order to represent vertices of a graph, determine edges connecting vertices, and determining whether or not a shard conflict is expected for a query by without departing from the scope of the invention.
  • other methods for representing vertices may use objects other than sets of bits as described herein, may use other methods for joining objects when an edge is identified etc.
  • DFEU 120 may analyze a number of relations between tables and determine whether or not a shard conflict exists or is to be expected based on an analysis of a number of relations or branches.
  • an “OR” condition may be in the form of: select an employee from tables S 1 , S 2 and S 3 where the same key is used for distributing tables S 1 and S 2 as described herein and, in addition, the same key is used for distributing tables S 1 and S 3 or the key used for distributing table S 2 is some column or key of S 3 .
  • condition or criteria may be expressed by:
  • S3.Col may be any column of S3 as specified in a query.
  • DFEU 120 may check the two conditions above as described herein and, only if both or all conditions are satisfied then DFEU 120 may determine that no shard conflict exists, otherwise, DFEU 120 may determine a shard conflict exists or is to be expected. Any other logical terms that may appear in a query (e.g., “if”, “else” and the like) may be parsed and analyzed by DFEU 120 and all logical branches or conditions may be examined as described herein. In an embodiment, DFEU 120 checks all possible or logical branches in a query, generates a set of logical statements and only if all logical statements indicate that no shard conflict exists then DFEU 120 determines no shard conflict is associated with responding to the query or request. Generally, DFEU 120 may ignore non-sharded tables except for special cases. For DFEU 120 may be configured to server queries or requests from a master or global table using a predefined logic or protocol.
  • Computing device 600 may include a controller 605 that may be, for example, a central processing unit processor (CPU), a chip or any suitable computing or computational device, an operating system 615 , a memory 620 , a storage 630 , an input devices 635 and an output devices 640 .
  • DFEU 120 includes components included in computing device 600 .
  • Operating system 615 may be a commercial operating system.
  • Memory 620 may be or may include, for example, a Random Access Memory (RAM), a Dynamic RAM (DRAM), a Flash memory, or other suitable memory units or storage units.
  • RAM Random Access Memory
  • DRAM Dynamic RAM
  • Flash memory or other suitable memory units or storage units.
  • Memory 620 may be or may include a plurality of, possibly different memory units.
  • Executable code 625 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 625 may be executed by controller 605 .
  • executable code 625 may be an application that performs operations described herein with respect to DFEU 120 .
  • Executable code 625 may be an application that performs the methods described herein.
  • Controller 605 may be configured to perform methods described herein for example by executing code or software, for example stored in memory 620 .
  • executable code 625 may carry out operations described herein in real-time.
  • Computing device 600 and executable code 625 may be configured to perform methods described herein in real-time.
  • computing device 600 and executable code 625 may update, process and/or act upon information at the same rate the information, or a relevant events, are received.
  • computing device 600 process queries and determines whether or not a shard conflict exists at the rate the queries are received from a plurality of client computing devices 130 .
  • more than one computing device 600 may be used.
  • a plurality of computing devices that include components similar to those included in computing device 600 may be connected to a network and used as a system.
  • a plurality of computing devices 600 may be used in order to deploy a plurality of DFEU 120 units, e.g., when scaling up a database by adding storage systems similar to storage systems 140 and 150 .
  • Storage 630 may be or may include, for example, a hard disk drive or other suitable removable and/or fixed storage units. Content may be stored in storage 630 and may be loaded from storage 630 into memory 620 where it may be processed by controller 605 .
  • Input devices 635 may be or may include a mouse, a keyboard, a touch screen or pad or any suitable input device.
  • Output devices 640 may include one or more displays, speakers and/or any other suitable output devices.
  • I/O devices may be connected to computing device 600 as shown by blocks 635 and 640 .
  • a wired or wireless network interface card NIC
  • NIC network interface card
  • Embodiments of the invention may include an article such as a computer or processor non-transitory readable medium, or a computer or processor non-transitory storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which, when executed by a processor or controller, carry out methods disclosed herein.
  • a storage medium such as memory 620
  • computer-executable instructions such as executable code 625
  • controller such as controller 605 .
  • a system may include components such as, but not limited to, a plurality of central processing units (CPU) or any other suitable multi-purpose or specific processors or controllers, a plurality of input units, a plurality of output units, a plurality of memory units, and a plurality of storage units.
  • a system may include or may be, a personal computer, a desktop computer, a mobile computer, a laptop computer, a notebook computer or any other suitable computing device.
  • the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed at the same point in time.

Abstract

A system and method for processing a database query may include determining a set of tables referenced in a query; representing the set of tables by vertices of a graph; and, if the graph is incomplete, then determining the query is associated with a shard conflict. A system and method may determine a query is not associated with a shard conflict if, and only if, the graph is complete.

Description

    BACKGROUND OF THE INVENTION
  • Databases are used for storing digital data or information. In addition to storing data, modern database systems enable users and applications to search for, and retrieve stored data. Various systems and methods are used in order to improve and provide services. For example, database management systems (DBMSs) are designed to interact with users or applications and provide data storage and retrieval operations.
  • Horizontal partitioning is a database design or principal whereby rows of a table in a database are stored separately. As known in the art, a database shard is a partition in a database. Specifically, a shard is a horizontal partition in a database. Accordingly, an individual partition in a database may be referred to as a shard or a database shard. As known in the art, shards may be used to partition a large database across a number of servers or partitions. Generally, sharding is used in order to distribute or split data over multiple machines.
  • However, if a table is split across a number of shards, when retrieving data from the table, data from multiple shards needs to be joined, aggregated or merged.
  • SUMMARY OF EMBODIMENTS OF THE INVENTION
  • A system and method for analyzing database queries may include determining a set of tables referenced in a query; representing the set of tables as, or by, vertices of a graph; and if the graph is incomplete then determining the query is associated with a shard conflict. A system and method may determine a query is not associated with a shard conflict if and only if the graph is complete. A system and method may include storing a set of tables on at least one shard based on a common key. A system and method may include determining a first and a third vertices are connected if the first and a second vertices on the graph are connected and the second and the third vertices on the graph are connected. A system and method may include distributing a first and second tables over at least two shards based on a common key. A common key used for distributing a first and second tables over at least two shards may be a table column.
  • A system and method may storing a first and second portions of a table on a respective first and second shards based on a respective first and second ranges of values of the common key. A system and method may determine that data from at least two shards is required in order to complete a record in a response for a query if at least one vertex in the graph is not connected to at least one other vertex in the graph. A method for determining a shard conflict may include distributing a plurality of tables across two or more shards according to a common key; receiving a query and determining a set of tables related to the query; representing the set of tables by vertices of a graph; and, if at least on vertex in the graph is not connected to at least one other vertex in the graph then determining the query is associated with a shard conflict.
  • A controller may be configured to determine a set of tables referenced in a query; a controller may be configured to determine which tables in the query are included in a set of relevant tables. A controller may represent the set of tables by vertices of a graph; and if the graph is incomplete then determine the query is associated with a shard conflict. A controller may be adapted to determine a query is not associated with a shard conflict if and only if the graph is complete.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanied drawings. Embodiments of the invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like reference numerals indicate corresponding, analogous or similar elements, and in which:
  • FIG. 1 is a high level block diagram of a system according to embodiments of the present invention;
  • FIG. 2 shows distribution of tables across shards as known in the art;
  • FIG. 3 shows distribution of tables across shards according to embodiments of the present invention;
  • FIG. 4A graphically shows graphs according to embodiments of the present invention;
  • FIG. 4B graphically shows graphs according to embodiments of the present invention;
  • FIG. 4C graphically shows graphs according to embodiments of the present invention;
  • FIG. 4D graphically shows graphs according to embodiments of the present invention;
  • FIG. 5 shows pseudo code according to embodiments of the present invention; and
  • FIG. 6 shows high level block diagram of an exemplary computing device according to embodiments of the present invention.
  • It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn accurately or to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity, or several physical components may be included in one functional block or element. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
  • DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
  • Although embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, “processing,” “computing,” “calculating,” “determining,” “establishing”, “analyzing”, “checking”, or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulates and/or transforms data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information non-transitory processor-readable storage medium that may store instructions, which when executed by the processor, cause the processor to perform operations and/or processes. Although embodiments of the invention are not limited in this regard, the terms “plurality” and “a plurality” as used herein may include, for example, “multiple” or “two or more”. The terms “plurality” or “a plurality” may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. The term set when used herein may include one or more items. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.
  • Reference is now made to FIG. 1, which shows a high-level block diagram of a system 100 according to embodiments of the present invention. As shown, system 100 may include a network 110 that may be, may comprise or may be part of, a private or public internet protocol (IP) network, or the internet, or a combination thereof. Additionally or alternatively, network 110 may be, comprise or be part of, a global system for mobile communications (GSM) network. For example, network 110 may include or comprise an IP network such as the internet, a GSM related network and any equipment for bridging or otherwise connecting such networks as known in the art.
  • As shown by blocks 140 and 150, system 100 may include a plurality of storage systems. Storage systems 140 and 150 may include or may be any suitable storage systems. For example, storage systems 140 and 150 include one or more hard disk drives, removable or fixed storage units. In some embodiments, storage systems 140 and 150 may be, or include, network storage devices and may be geographically distant from one another or from other components of system 100. Storage systems 140 and 150 may each include a server and/or a management system (e.g., a DBMS). As shown, storage system 140 may include shard 145 and storage system 150 may include shard 155. Shards 145 and 155 may be shards as known in the art, e.g., associated with a partition or server in storage systems 140 and 150 respectively.
  • For the sake of clarity and simplicity, only one shard is shown in each of storage systems 140 and 150, however, it will be understood that any number of shards may be included in storage systems 140 and 150. Likewise, for the sake of clarity, only two storage systems (140 and 150) are shown, however, it will be understood that system 100 may include any number of storage systems that may be similar to storage systems 140 and 150.
  • As shown by block 120, system 100 may include a database front end unit 120, referred to herein as DFEU 120. As known in the art, a front end or proxy may provide an interface between a database and its clients. For example, a proxy may receive (or intercept) requests sent from a user or application to a database, examine the requests, possibly process the requests and either transfer the requests to the database or generate requests to the database based on received or intercepted requests. A proxy may receive responses from a database, process the responses and send the (possibly processed) responses to clients of the database. DFEU 120 may be a proxy or front end or it may be included in a proxy or front end. DFEU 120 may be any module, unit or component designed and configured to perform functionalities and methods as described herein. For example, in an embodiment, DFEU 120 is a computing device similar to computing device 600 including a memory and controller wherein the memory stores instructions that, when executed by the controller, cause the controller to determine whether or not a shard conflict exists. For example, a computing device similar to computing device 600 determines whether or not a shard conflict will be encountered when generating a response for a query.
  • A computing device similar to computing device 600 may perform analysis of join, merge or aggregation operations related to a database as described herein. DFEU 120 may receive requests or queries from users or applications, determine whether or not a shard conflict exists with relation to a request or query, process the requests or queries and interact with storage systems 140 and 150 in order to respond to queries or perform related tasks. For example, a request for data stored in storage systems 140 and 150 may be sent (e.g., from a user or application) to DFEU 120 over network 110. DFEU 120 may process a request and, based on processing a request, DFEU 120 may interact with storage systems 140 and 150 in order to retrieve requested data or perform other operations. DEFU 120 may send or otherwise provide retrieved or requested data to the requesting entity.
  • As shown, system 100 may include, or be connected to, a client computing device 130. Client computing device 130 may include or may be, for example, a server, a personal computer or a desktop computer, a mobile or laptop computer. In some embodiments, client computing device 130 may be a network device, a smartphone or a mobile phone or any other suitable computing device capable of at least communicating with DEFU 120 over network 110. Client computing device 130 may be any suitable computing device configured or adapted to enable a user or an application to interact with DEFU 120.
  • As discussed, if a table is split into portions, and portions of the table are stored on more than one shard, then, in order to retrieve data from the table, a plurality of shards may need to be accessed. Accordingly, a system may need to join, aggregate or merge data portions retrieved from a plurality of shards in order to provide the requested data. In other cases, if data from two tables is needed in order to complete a request for data and each of the two tables is stored on a different shard then two shards need to be accessed and data retrieved from the two shards needs to be merged, aggregated or joined. Clearly, the more shards involved in the process, the greater the challenge.
  • Reference is made to FIG. 2, showing a distribution of tables across shards. As shown, in an exemplary case, tables 215, 216, 225 and 226 which are tables related to departments and employees in an organization, are stored on shards 210 and 220.
  • As shown, table 215 stored on shard 210 is a department table that includes a column with a department identification number (labeled “Dep. ID”), where an identification number of a department is stored. As shown, table 215 further includes a department name column (labeled “Dep. NAME”) and an additional column (labeled “No. of employees”) for the number of employees in each department. It will be understood that other or additional columns may be included in table 215. Table 225 stored on shard 220 is similar to table 215. Other types of tables and other data may be stored.
  • As shown, table 216 stored on shard 210 is an employee table. As shown, table 216 includes a column for employee identification (labeled “Employee ID”), a column with employee name (labeled “Employee NAME”) and a column that indicates the department to which the employee belongs (labeled “Dep. ID”). For example, as shown, employee Jack belongs to the operations department (Dep. ID “2”). Table 226 stored on shard 220 is similar to table 216.
  • Queries or requests from a database typically involve some logical relations between entities. For example, a user may request the list (e.g., names of employees) of all employees in the management department. Using the tables and shards shown by FIG. 2, a system may have to retrieve data from both shards 210 and 220 in order to generate a response. For example, having found the employee named David in table 226 on shard 220, a system now needs to access table 215 on shard 210 in order to find the name of the department with department ID 1 (the marketing department in the current example).
  • As referred to herein, a shard conflict exists if, for at least one record or other object in a response to a query, data from at least two shards must be retrieved. For example and as described, a response or a result record may need to include an employee with the “Name” of the employee's department name. If the result record can be composed using data in a single shard then, as referred to herein, no shard conflict exists. However, if, for example, a cartesian product (and possibly filtering of the cartesian product result) of data from two or more shards are required then a system may determine a shard conflict exists. For example, when distributing tables over shards as shown in FIG. 2, to generate a record that includes both an employee and the name of the relevant department, a system may need to retrieve both table 215 and table 225, generate a cartesian product as known in the art to produce a set of products and then filter out all products that do not match a criteria.
  • For example, since it may not be known in advance which employee in tables 215 and 226 belongs to which department in table 215, a cartesian product would include a set that includes all possible combinations of all employees and all departments. The resulting (typically very large) set may subsequently need to be processed in order to filter out irrelevant products or identify relevant ones.
  • As discussed, there typically exists some logic or rational with respect to queries that require joining, aggregating or joining data items from two or more shards. According to an embodiment of the invention, a plurality of tables (or other constructs in a database or shards) are distributed across two or more shards according to a common key. When distributing a plurality of tables based on the same key, the key is referred to herein as a common key to indicate the key is common to the plurality of tables. For example, instead of storing tables 215, 216, 225 and 226 on shards 210 and 220 as shown in FIG. 2, a different arrangement may be used.
  • Reference is made to FIG. 3, showing a distribution of tables across shards according to embodiments of the invention. As shown by tables 315, 320, 335 and 340, tables 215, 216, 225 and 226 may be distributed on shards 310 and 330 based on a common (or same) key. For example, shards 310 and 330 may be stored on storage systems 140 and 150 respectively as shown by shards 145 and 155.
  • In an exemplary case or embodiment, the department identification (Dep. ID), key or value may be used in order to split a department table and store the resulting tables on shards 310 and 330. As shown, shard 310 may store a department table 315 that may include departments with a department identification in a specific range, e.g., one to two (1-->2). As shown, an employees table 320 in shard 310 may include employees belonging to departments in departments table 315. For example, only employees belonging to departments with an identification key value in the specific range (1-->2) are included in employees table 320 stored on shard 310. Similarly, shard 330 includes a departments table 335 that includes departments having a department key value or identification number that is in the range of three to four (3-->4) and shard 330 also includes an employees table 340 that includes employees associated with the respective range of departments. For example and as shown, employees table 340 includes employees that belong to departments having a department identification value in the range of three to four (3-->4).
  • Otherwise described, in this example, employees and departments are shard tables, sharded by department identification. Any other key (or column in a table) may be used for splitting tables as exemplified by FIG. 3. Any number of tables may be split and distributed or re-distributed across a plurality of a respective number of shards based on a common key or column as described herein. For example, an organization may maintain an expenses table that lists expenses per departments and per employees. In an embodiment, an employees table, departments table and expenses table may be distributed across a number of shards based on a common key (e.g., a department identification). Accordingly, details related to the employees and expenses of a specific department may be stored (and found) in the same shard that stores details of the specific department.
  • As shown by table 350, an embodiment may include a distribution table or list. For example, DFEU 120 may store and maintain a table or list similar to table 350 on storage 140. As shown by table 350, a distribution table or list may include, for each table, a key used for distributing the table and an identification of the table. Other entries in a distribution list may be a key value or range of values, an identification of shards used for storing the table and the like. For the sake of simplicity, names of tables are shown under table identification in table 350, however, it will be understood that any identification may be used, e.g., an internal identification used by DFEU 120. Similarly, as shown, entries in the key column in table 350 are “Dep. ID”, in accordance with examples described herein, however, it will be understood that any identification of a key may be used, e.g., an identification of keys as generated by DFEU 120.
  • For example, a distribution list or table maintained by DFEU 120 indicates that a common key to be used is the department identification in the departments table. Although not shown by table 350, a distribution table may further indicate that the range of 1 to 2 is stored on shard 310 in tables 315 and 320 (depending on whether it is a department or employee), the range of 3 to 4 is stored on tables 335 and 340 shard 330 and so on. For example, two entries in table 350 may be related to departments of an organization and may include a respective first and second key ranges and a respective first and second shards. Accordingly, when new data is to be added to a table, DFEU 120 may consult a distribution table and determine which table on which shard is to be updated with the new data.
  • For example, in the exemplary scenario described herein with reference to FIG. 3, if a new employee is to be added to the management department, DFEU 120 may consult the distribution table in order to find the shard and table for the given key value (4 in this case) and determine that the new employee is to be added to table 340 on shard 330 since these are the table and shard for storing entries using the common key of department identification as described.
  • Embodiments of the invention may examine a query, request or any interaction with a database and determine whether or not a shard conflict is associated with the request, query or interaction. For the sake of simplicity, the term query is used herein to refer to any message or request received, from a client (that may be a user or an application) by a database.
  • In an embodiment, DFEU 120 examines a query and determines whether or not the query causes or generates a shard conflict. If a shard conflict is determined, DFEU 120 may perform a number of actions. For example, if a shard conflict is determined, the relevant query may be rejected or it may be passed on to an entity for further prosecution. In an embodiment, if a shard conflict is identified then the query may be sent to all relevant shards, responses from all relevant shards may be collected and a response to the query may be generated by joining, merging or aggregating records from the responses received from the relevant shards.
  • For example, upon detecting a shard conflict, DFEU 120 (or another unit to which DFEU 120 may send a request that causes a shard conflict) may split a query into a number of queries which are then executed on a number of shards. By executing a query on a shard, it is meant performing an operation based on the query with respect to data in the shard. For example, if the query includes searching for a data element then executing the query on a shard includes searching for the element in data stored in the shard.
  • Results from a number of shards may then be aggregated or joined as known in the art. Any system or method for splitting a query (after DFEU 120 determines that a shard conflict exists with respect to the query) may be used without departing from the scope of the invention.
  • For example, if a shard conflict is determined or identified, a query or request may be executed separately on each relevant shard and results from a plurality of prosecutions on a plurality of shards may be unified to produce a response.
  • If DFEU 120 determines no shard conflict is related to a query (or, as described herein, If DFEU 120 determines no shard conflict exists), DFEU 120 may identify the relevant shard or shards and send the query to the relevant shards for further processing. For example, using entries in a table similar to table 350 described herein, DFEU 120 determines which shard stores the relevant table and sends the query to the shard for prosecution.
  • It will be noted that a request may not cause a shard conflict while still requiring accessing multiple shards. For example, if data is distributed as shown in FIG. 3, a query for the list of ALL departments may require accessing more than one shard but no aggregation (or joining) of data from multiple shards is required in order to complete (or produce or generate) records in the response. On the other hand, a statement may request data for a specific department and still cause a shard conflict. For example, a query for ALL employees of “Marketing” when distribution is as shown in FIG. 2 will cause a shard conflict since, in order to determine the name of departments for employees found in table 226 on shard 220, data from table 215 on shard 210 needs to be used.
  • As described, DFEU 120 may examine a query and determine whether or not all tables or other data constructs referenced in the query or relevant to the query are stored on the same shard. In an embodiment, DFEU 120 examines a query and determines which tables in the query are the set of relevant tables by identifying the set of tables which are referenced by the query or which are relevant to the query. For example, DFEU 120 identifies a “FROM” clause in a query. As known in the art, a “FROM” clause in a structured query language (SQL) indicates the tables from which data is requested. Accordingly, DFEU 120 may examine a query and may determine the tables that need to be accessed in order to generate a response for the query. It will be understood that DFEU 120 may be configured to parse any applicable query or request and that any method of identifying the relevant tables in a database may be used by embodiments of the invention.
  • In the discussion below, a query is related to, includes or references, a set of shard tables denoted here by {S0, . . . , Sn-1}, and, for each 0<=i<=N−1, SKi is the key based on which table Si is distributed across shards. For example, shard table Si may be table 215 and SKi may be the department identification number or column as described herein. For example, in an embodiment, a key is a column number.
  • DFEU 120 may represent the query as a graph where the tables are represented as vertices on the graph and for each binary predicate Si.SKi=Sj.SKj there's an edge between Si and Sj. The binary predicate Si.SKi=Sj.SKj as used herein is generally the statement the key SKi of table Si is the same as the key SKj of table Sj.
  • From transitivity, DFEU 120 may determine that if, Si.SKi=Sj.SKj and, in addition, Sj.SKj=Sm.SKm, then there are edges between Sj and Sm and between Si to Sm. To determine a shard conflict, DFEU 120 may check or determine whether or not the graph is complete. As described herein, in an embodiment, if DFEU 120 determines the graph is complete then DFEU 120 further determines no shard conflict exists with respect to the relevant query. In an embodiment, DFEU 120 determines a query is not associated with a shard conflict if, and only if, the graph is complete.
  • As referred to herein, a graph is complete if each vertex on the graph is connected to all other vertices on the graph. As referred to herein, two vertices on a graph are connected if and only if there is an edge between them.
  • In an embodiment, DFEU 120 determines whether or not a graph is complete. For example, DFEU 120 determines a graph is complete if there exists a path (comprising one or more edges) from any vertex on the graph to any other vertex on the graph. In an embodiment, DFEU 120 determines the graph is incomplete if, for at least one vertex on the graph, there exists no path of edges to another vertex on the graph.
  • In an embodiment, if all vertices representing tables on a graph are connected then DFEU 120 may determine no shard conflict exists with relation to the query. If the graph is incomplete, that is, at least one vertex on the graph is not connected by a line or edge (or a plurality of edges as described) to at least one other vertex on the graph, then DFEU 120 determines that the graph is incomplete and a shard conflict exists.
  • For example, in the case shown by FIG. 2, tables 215, 216, 225 and 226 are not distributed by a common or same key and, accordingly, if tables 215, 216, 225 and 226 are relevant to a query, then DFEU 120 determines a shard conflict exists (e.g., both shards 220 and 210 need to be accessed). In another case, e.g., as shown by FIG. 3, if the relevant tables are the employees and department tables, DFEU 120 identifies that these two tables are distributed across shards using the same or common key (the department ID as described) and therefore determines no shard conflict exists with respect to the query.
  • Reference is made to FIGS. 4A, 4B, 4C and 4D that graphically depict example graphs, vertices and edges according to embodiments of the invention. As shown by FIG. 4A, the relevant tables in a query are S1 and S2 and if the keys used to distribute tables S1 and S2 are the same (as indicated by S1.SK1=S2.SK2) then DFEU 120 may determine an edge exists between, or connects, the two vertices as shown and no shard conflict exists. Otherwise described, in this example and as shown by FIG. 4A, selecting * from S1 and S2 where S1.SK1=S2.SK2 does not result a shard conflict as indicated by the line connecting S1 and S2 (where “*” may be any element in S1 and S2).
  • As shown by FIG. 4B, since the key used for S1 is not the same as the one used for S2 (e.g., when the parameter or key used for distributing S1 over shards is not the same parameter or key used for distributing S2 over shards) no edge (or connection) exists between the vertices and therefore, in this example, DFEU 120 determines a shard conflict exists. Otherwise described, in this example and as shown by FIG. 4B selecting * from S1 and S2 where S1.SK1=S2.col will generate a shard conflict as indicated by an absence of a line connecting S1 and S2
  • As exemplified by FIG. 4C, since the keys used for tables S1, S2 and S3 are the same (as indicated by (S1.SK1=S2.SK2) and (S2.SK2=S3.SK3), the graph is complete, edges connect all vertices and therefore, in this case, DFEU 120 may determine no shard conflict arises. Note that if the key of tables S1 and S2 are the same and the keys of tables S2 and S3 then DFEU 120 may determine that the keys of S1 and S3 are the same as well. Otherwise described, in this example and as shown by FIG. 4C, selecting * from S1 join S2 on (S1.SK1=S2.SK2) join S3 on (S2.SK2=S3.SK3) does not generate or cause a shard conflict as indicated by the lines connecting S1, S2 and S3.
  • As shown by FIG. 4D, if only some of the vertices are connected DFEU 120 may determine a shard conflict exists. For example, as shown by FIG. 4D, vertex S4 is not connected to vertex S2, therefore, the graph shown in FIG. 4D is incomplete. For example, although the keys used for tables S1 and S2 are the same, and the keys used for tables S3 and S4 are the same, not all vertices are connected because the key used for tables S1 and S2 is not the same as the key used for tables S3 and S4. Accordingly, in the case shown by FIG. 4C DFEU 120 may determine a shard conflict exists. Otherwise described, in this example and as shown by FIG. 4D selecting * from S1 join S2 on (S1.SK1=S2.SK2) join S3 join S4 on (S3.SK3=S4.SK4) will cause or generate a shard conflict, for example, as indicated by absence of lines connecting S3 and S2 with S1 and S2.
  • Reference is made to FIG. 5 showing pseudo code according to embodiments of the invention. Code similar to the pseudo code shown in FIG. 5 may be used by DFEU 120 in order to determine whether or not a shard conflict as described herein is expected when responding to, or otherwise processing, a query. As common in the art, the patterns “//”, “/*” and “*/” are used in FIG. 5 to indicate or denote comments in the code. Other languages and other specific code may be used.
  • Generally, for “N” distributed tables, each table may be assigned a unique identification or index, accordingly, the pseudo code shown in FIG. 5 is for tables numbered 0 to N−1. For example, when determining which tables are referenced in a query, DFEU 120 also may determine the number of tables and sets “N” to the correct number. In an embodiment, DFEU 120 associates each of the tables identified as related to the query with a unique value or number selected from values of 0 to (N−1).
  • Blocks are used in the pseudo code shown in FIG. 5 to represent vertices in a graph, where vertices represent tables as described herein. Any suitable object may be used to implement blocks, e.g., a memory that stores a value.
  • In an embodiment, each block represents one or more vertices. In an embodiment, initially or at a first step, each block is a singleton. As known in the art, a singleton is a mathematical expression that describes a group consisting one element, in the present example, initially, a singleton describes a group consisting a single vertex. Accordingly, initially, a graph includes N vertices and no edges since no vertices are yet connected by edges.
  • For example, initially, the block consisting or representing table 1 is the singleton {1} and the block consisting or representing table n−3 is the singleton {n−3}. Accordingly, initially, for any vertexi, where: 0<=i<=N−1, the vertex is contained in (or represented by) the block {i}.
  • In an embodiment, at any given point of the process, blocks are orthogonal (e.g., a logical intersection of any two blocks is an empty group) and a unification of all blocks contains or represents all vertices.
  • As shown, if it is determined that a first and second tables (e.g., table “i” and table “j” as shown) are distributed using the same key then the blocks containing or representing the tables are joined by unifying them into one block. In one embodiment, unifying, joining or merging a first and second blocks into a third block connects two vertices groups respectively represented by the first and second blocks. As a result of merging or unifying a first and second blocks into a third block, the third block now represents the two vertices groups previously represented by the first and second blocks.
  • In an embodiment, blocks represent unconnected vertices or parts of the graph. For example, following the creation of the third block as described, no edges are determined between any blocks on the graph, e.g., between the third block and any other blocks representing vertices on the graph. Accordingly, in an embodiment, blocks represent unconnected parts of graph.
  • Accordingly and as further shown, if, at the end of the process, only one block remains then it is determined that no shard conflict exists. For example and as shown, the pseudo code in FIG. 5 will return one (“1”) that may be interpreted as “TRUE” if only one block remains, indicating that no shard conflict exists with respect to a query related to the N tables identified. As shown, the pseudo code in FIG. 5 will return zero (“0”) that may be interpreted as “FALSE” if more than one block remains, indicating that a shard conflict exists, or is to be expected, with respect to the query.
  • Blocks as described may be realized or implemented using any suitable method or means. For example, in an embodiment, each block is a set of N bits, each bit identified by its index and, in order for a block to contain or represent table “i”, bit “i” in the block is set. Accordingly, in such specific implementation or embodiment, in order to unify, join or merge two blocks representing tables “i” and “j”, a new block that has bits “i” and “j” set is created and the new block replaces the two blocks previously used for representing tables “i” and “j”. Accordingly, when identifying two tables distributed using the same key, the number of blocks is reduced by one since a single block is now used to represent both tables.
  • It will be understood that any applicable method or system may be used in order to represent vertices of a graph, determine edges connecting vertices, and determining whether or not a shard conflict is expected for a query by without departing from the scope of the invention. For example, other methods for representing vertices may use objects other than sets of bits as described herein, may use other methods for joining objects when an edge is identified etc.
  • Special cases may need to be addressed. For example, if, based on analyzing a query, DFEU 120 determines that an “OR” condition (as further described below) exists then DFEU 120 may analyze a number of relations between tables and determine whether or not a shard conflict exists or is to be expected based on an analysis of a number of relations or branches. For example, an “OR” condition may be in the form of: select an employee from tables S1, S2 and S3 where the same key is used for distributing tables S1 and S2 as described herein and, in addition, the same key is used for distributing tables S1 and S3 or the key used for distributing table S2 is some column or key of S3. For example, using the terminology used herein, such condition or criteria may be expressed by:
  • Select employee from S1, S2, S3 where S1.SK1=S2.SK2 AND ((S2.SK2=S3.SK3) OR (S2.SK2=S3.Col)). In this case, two branches or conditions that DFEU 120 may check are:
  • 1) S1.SK1=S2.SK2 AND S2.SK2=S3.SK3 and
  • 2) S1.SK1=S2.SK2 AND S2.SK2=S3.Col. where S3.Col may be any column of S3 as specified in a query.
  • Accordingly, DFEU 120 may check the two conditions above as described herein and, only if both or all conditions are satisfied then DFEU 120 may determine that no shard conflict exists, otherwise, DFEU 120 may determine a shard conflict exists or is to be expected. Any other logical terms that may appear in a query (e.g., “if”, “else” and the like) may be parsed and analyzed by DFEU 120 and all logical branches or conditions may be examined as described herein. In an embodiment, DFEU 120 checks all possible or logical branches in a query, generates a set of logical statements and only if all logical statements indicate that no shard conflict exists then DFEU 120 determines no shard conflict is associated with responding to the query or request. Generally, DFEU 120 may ignore non-sharded tables except for special cases. For DFEU 120 may be configured to server queries or requests from a master or global table using a predefined logic or protocol.
  • Reference is made to FIG. 6, showing high level block diagram of an exemplary computing device according to embodiments of the present invention. Computing device 600 may include a controller 605 that may be, for example, a central processing unit processor (CPU), a chip or any suitable computing or computational device, an operating system 615, a memory 620, a storage 630, an input devices 635 and an output devices 640. In an embodiment, DFEU 120 includes components included in computing device 600.
  • Operating system 615 may be a commercial operating system. Memory 620 may be or may include, for example, a Random Access Memory (RAM), a Dynamic RAM (DRAM), a Flash memory, or other suitable memory units or storage units. Memory 620 may be or may include a plurality of, possibly different memory units.
  • Executable code 625 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 625 may be executed by controller 605. For example, executable code 625 may be an application that performs operations described herein with respect to DFEU 120. Executable code 625 may be an application that performs the methods described herein. Controller 605 may be configured to perform methods described herein for example by executing code or software, for example stored in memory 620.
  • Where applicable, executable code 625 may carry out operations described herein in real-time. Computing device 600 and executable code 625 may be configured to perform methods described herein in real-time. For example, computing device 600 and executable code 625 may update, process and/or act upon information at the same rate the information, or a relevant events, are received. For example, in an embodiment, computing device 600 process queries and determines whether or not a shard conflict exists at the rate the queries are received from a plurality of client computing devices 130.
  • In some embodiments, more than one computing device 600 may be used. For example, a plurality of computing devices that include components similar to those included in computing device 600 may be connected to a network and used as a system. For example, a plurality of computing devices 600 may be used in order to deploy a plurality of DFEU 120 units, e.g., when scaling up a database by adding storage systems similar to storage systems 140 and 150.
  • Storage 630 may be or may include, for example, a hard disk drive or other suitable removable and/or fixed storage units. Content may be stored in storage 630 and may be loaded from storage 630 into memory 620 where it may be processed by controller 605. Input devices 635 may be or may include a mouse, a keyboard, a touch screen or pad or any suitable input device. Output devices 640 may include one or more displays, speakers and/or any other suitable output devices.
  • Any applicable input/output (I/O) devices may be connected to computing device 600 as shown by blocks 635 and 640. For example, a wired or wireless network interface card (NIC).
  • Embodiments of the invention may include an article such as a computer or processor non-transitory readable medium, or a computer or processor non-transitory storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which, when executed by a processor or controller, carry out methods disclosed herein. For example, a storage medium such as memory 620, computer-executable instructions such as executable code 625 and a controller such as controller 605.
  • A system according to embodiments of the invention may include components such as, but not limited to, a plurality of central processing units (CPU) or any other suitable multi-purpose or specific processors or controllers, a plurality of input units, a plurality of output units, a plurality of memory units, and a plurality of storage units. In some embodiments, a system may include or may be, a personal computer, a desktop computer, a mobile computer, a laptop computer, a notebook computer or any other suitable computing device. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed at the same point in time.
  • While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents may occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

Claims (18)

1. A method for analyzing join operations in a database proxy comprising plurality of tables, the method comprising:
identifying a set of N relevant shard tables referenced in a query, the query received by the proxy;
respectively assigning a set of indices {S0, . . . , SN-1} to the set of relevant shard tables, wherein the values of the indices are between 0 and N−1;
representing the set of relevant tables as vertices of a graph;
for (0<=i<=N−1) and (0<=j<=N−1):
denoting SKi and SKj as the keys based on which each respective table Si and Sj is distributed across shards;
defining an edge between vertices Si and Sj for each binary predicate Si.SKi=Sj.SKj, wherein key SKi of table Si equals key SKj of table Sj;
defining the graph is complete if there exists one or more edges from any of said vertices on the graph to any other vertex on the graph; and
if the graph is incomplete then determining the query is associated with a shard conflict.
2. The method of claim 1, comprising determining the query is not associated with a shard conflict if and only if the graph is complete.
3. The method of claim 1, comprising storing the set of tables on at least one shard based on a common key.
4. The method of claim 1, comprising determining a first and a third vertices are connected if the first and a second vertices on the graph are connected and the second and the third vertices on the graph are connected.
5. The method of claim 1, comprising distributing a first and second tables over at least two shards based on a common key.
6. The method of claim 3, wherein the common key is a column.
7. The method of claim 3, comprising storing a first and a second portion of a table on a respective first and a second shard based on a respective first and second ranges of values of the common key.
8. The method of claim 1, wherein if at least one vertex in the graph is not connected to at least one other vertex in the graph then determining that data from at least two shards is required in order to complete a record in a response for the query.
9. A method for determining a shard conflict, the method comprising:
distributing a plurality of tables across two or more shards according to a common key;
receiving a query and determining a set of N tables related to the query;
respectively assigning a set of indices {S0, . . . , SN-1} to the set of N tables, wherein the values of the indices are between 0 and N−1;
representing the set of tables by vertices of a graph;
for (0<=i<=N−1) and (0<=j<=N−1):
denoting SKi and SKj as the keys based on which each respective table Si and Sj is distributed across shards;
defining an edge between vertices Si and Sj for each binary predicate Si.SKi=Sj.SKj, wherein key SKi of table Si equals key SKj of table Sj;
defining the graph is complete if there exists one or more edges from any of said vertices on the graph to any other vertex on the graph; and
if at least one vertex in the graph is not connected to at least one other vertex in the graph then determining the query is associated with a shard conflict.
10. A system comprising:
a memory; and
a controller, the controller configured to:
determine a set of N, which are referenced in a query, are a set of relevant tables;
respectively assign a set of indices {S0, . . . , SN-1} to the set of N tables, wherein the values of the indices are between 0 and N−1;
represent the set of N tables by vertices of a graph;
for (0<=i<=N−1) and (0<=j<=N−1):
denote SKi and SKi as the keys based on which each respective table Si and Sj is distributed across shards;
define an edge between vertices Si and Sj for each binary predicate Si.SKi=Sj.SKj, wherein key SKi of table Si equals key SKj of table Sj;
define the graph is complete if there exists one or more edges from any of said vertices on the graph to any other vertex on the graph; and
if the graph is incomplete then determine the query is associated with a shard conflict.
11. The system of claim 10, wherein the controller is adapted to determine the query is not associated with a shard conflict if and only if the graph is complete.
12. The system of claim 10, wherein at least some tables included in the set of tables are stored on at least one shard based on a common key.
13. The system of claim 10, wherein the controller is adapted to determine a first and a third vertices are connected if the first and a second vertices on the graph are connected and the second and the third vertices on the graph are connected.
14. The system of claim 10, wherein at least a first and a second tables related to the query are distributed over at least two shards based on a common key.
15. The system of claim 12, wherein the common key is a column.
16. The system of claim 12, wherein a first and second portions of a table are stored on a respective first and second shards based on a respective first and second ranges of values of the common key.
17. The system of claim 10, wherein the controller is adapted to determine that data from at least two shards is required in order to complete a record in a response for the query if at least one vertex in the graph is not connected to at least one other vertex in the graph.
18. The system of claim 10, wherein the controller is adapted to distribute at least some of the tables across two or more shards according to a common key.
US14/064,210 2013-10-28 2013-10-28 System and method for analysis of a database proxy Abandoned US20150120697A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/064,210 US20150120697A1 (en) 2013-10-28 2013-10-28 System and method for analysis of a database proxy
US14/985,277 US10496645B1 (en) 2013-10-28 2015-12-30 System and method for analysis of a database proxy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/064,210 US20150120697A1 (en) 2013-10-28 2013-10-28 System and method for analysis of a database proxy

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/985,277 Continuation US10496645B1 (en) 2013-10-28 2015-12-30 System and method for analysis of a database proxy

Publications (1)

Publication Number Publication Date
US20150120697A1 true US20150120697A1 (en) 2015-04-30

Family

ID=52996627

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/064,210 Abandoned US20150120697A1 (en) 2013-10-28 2013-10-28 System and method for analysis of a database proxy
US14/985,277 Active 2036-02-02 US10496645B1 (en) 2013-10-28 2015-12-30 System and method for analysis of a database proxy

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/985,277 Active 2036-02-02 US10496645B1 (en) 2013-10-28 2015-12-30 System and method for analysis of a database proxy

Country Status (1)

Country Link
US (2) US20150120697A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109762A1 (en) * 2015-10-19 2017-04-20 Yeon Tae KIM Omni-channel marketing curation system based on big data
CN108132838A (en) * 2016-11-30 2018-06-08 华为技术有限公司 A kind of method, apparatus and system of diagram data processing
US20180307717A1 (en) * 2017-04-21 2018-10-25 Microsoft Technology Licensing, Llc Trigger System For Databases Using Proxy
US10834055B2 (en) 2016-08-29 2020-11-10 Microsoft Technology Licensing, Llc. Cross-tenant data leakage isolation
US11030247B2 (en) * 2017-12-29 2021-06-08 Electronic Arts Inc. Layered graph data structure

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11032361B1 (en) * 2020-07-14 2021-06-08 Coupang Corp. Systems and methods of balancing network load for ultra high server availability

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140108421A1 (en) * 2012-10-04 2014-04-17 Codefutures Corporation Partitioning database data in a sharded database

Family Cites Families (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625811A (en) 1994-10-31 1997-04-29 International Business Machines Corporation Method and system for database load balancing
US5799306A (en) 1996-06-21 1998-08-25 Oracle Corporation Method and apparatus for facilitating data replication using object groups
US6950848B1 (en) 2000-05-05 2005-09-27 Yousefi Zadeh Homayoun Database load balancing for multi-tier computer systems
US6801921B2 (en) 2000-09-08 2004-10-05 Hitachi, Ltd. Method and system for managing multiple database storage units
US6873989B1 (en) 2000-10-04 2005-03-29 Bmc Software, Inc. Graphical display of IMS space usage characteristics
US6691101B2 (en) 2001-06-21 2004-02-10 Sybase, Inc. Database system providing optimization of group by operator over a union all
US7539665B2 (en) 2001-10-23 2009-05-26 Visto Corporation System and method for merging remote and local data in a single user interface
US7249118B2 (en) 2002-05-17 2007-07-24 Aleri, Inc. Database system and methods
US7386554B2 (en) 2002-09-03 2008-06-10 Infoglide Software Corporation Remote scoring and aggregating similarity search engine for use with relational databases
US7698338B2 (en) 2002-09-18 2010-04-13 Netezza Corporation Field oriented pipeline architecture for a programmable data streaming processor
JPWO2004063928A1 (en) 2003-01-14 2006-05-18 アクセリア株式会社 Database load reduction system and load reduction program
US20050021511A1 (en) 2003-07-24 2005-01-27 Etagon Israel Ltd. System and method for load balancing in database queries
JP4129819B2 (en) 2003-10-06 2008-08-06 インターナショナル・ビジネス・マシーンズ・コーポレーション Database search system, search method thereof, and program
US7644087B2 (en) 2005-02-24 2010-01-05 Xeround Systems Ltd. Method and apparatus for data management
US8130758B2 (en) 2005-06-27 2012-03-06 Bank Of America Corporation System and method for low latency market data
US7860865B2 (en) 2005-12-19 2010-12-28 Yahoo! Inc. System of a hierarchy of servers for query processing of column chunks in a distributed column chunk data store
US8209696B2 (en) 2006-02-13 2012-06-26 Teradata Us, Inc. Method and system for load balancing a distributed database
US20070288530A1 (en) 2006-06-08 2007-12-13 Xeround Systems Ltd. Method and a system for backing up data and for facilitating streaming of records in replica-based databases
EP2067104A1 (en) 2006-09-28 2009-06-10 Xeround Systems Ltd. Apparatus and method for a distributed storage global database
US20080140696A1 (en) 2006-12-07 2008-06-12 Pantheon Systems, Inc. System and method for analyzing data sources to generate metadata
JP4352079B2 (en) 2007-03-28 2009-10-28 株式会社東芝 System, apparatus, and method for retrieving information from a distributed database
CN101657791A (en) 2007-04-13 2010-02-24 汤姆逊许可证公司 Enhanced database scheme to support advanced media production and distribution
US20090012932A1 (en) 2007-07-03 2009-01-08 Xeround Systems Ltd. Method and System For Data Storage And Management
US8676746B2 (en) 2008-01-03 2014-03-18 Microsoft Corporation Database management system risk assessment
US8364690B2 (en) 2008-02-21 2013-01-29 International Business Machines Corporation Automatic creation of pre-condition queries for conflict detection in distributed processing systems
TWI476610B (en) 2008-04-29 2015-03-11 Maxiscale Inc Peer-to-peer redundant file server system and methods
US20090319344A1 (en) 2008-06-18 2009-12-24 Tepper Samuel R Assessment of sales force personnel for improvement of sales performance
US20100017395A1 (en) 2008-07-16 2010-01-21 Sapphire Information Systems Ltd. Apparatus and methods for transforming relational queries into multi-dimensional queries
US8577941B2 (en) 2008-08-27 2013-11-05 Red Hat, Inc. Virtual database system
WO2010037065A2 (en) 2008-09-26 2010-04-01 Cmi Corporate Marketing D/B/A Prelude Innovations, Inc. Scalable relational database replication
US8458166B2 (en) 2008-10-09 2013-06-04 International Business Machines Corporation Dynamic context definitions in distributed databases
US8214352B2 (en) 2008-11-26 2012-07-03 Hewlett-Packard Development Company Modular query optimizer
US8438177B2 (en) 2008-12-23 2013-05-07 Apple Inc. Graphical result set representation and manipulation
US9229982B2 (en) 2008-12-23 2016-01-05 SAP France S.A. Processing queries using oriented query paths
EP2224358A1 (en) 2009-02-27 2010-09-01 AMADEUS sas Graphical user interface for search request management
US8832074B2 (en) 2009-10-21 2014-09-09 Teradata Us, Inc. System, method and computer-readable medium for dynamic skew avoidance for generic queries
US20110246550A1 (en) 2010-04-02 2011-10-06 Levari Doron System and method for aggregation of data from a plurality of data sources
US8805784B2 (en) 2010-10-28 2014-08-12 Microsoft Corporation Partitioning online databases
US8996505B1 (en) 2012-03-30 2015-03-31 Pivotal Software, Inc. Data movement driven automatic data placement
US9087105B2 (en) 2012-10-04 2015-07-21 Adobe Systems Incorporated Rule-based extraction, transformation, and loading of data between disparate data sources
US10303702B2 (en) 2014-02-07 2019-05-28 Ignite Scalarc Solutions, Inc. System and method for analysis and management of data distribution in a distributed database environment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140108421A1 (en) * 2012-10-04 2014-04-17 Codefutures Corporation Partitioning database data in a sharded database

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Asplund, "Edge Coloring BIBDS and constructing MOELRs", 2010 *
Faulstich et al., "Implementing Linguistic Query Languages Using LoTos", 2005 *
Rohloff et al., "High-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store, 2010 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109762A1 (en) * 2015-10-19 2017-04-20 Yeon Tae KIM Omni-channel marketing curation system based on big data
US10657546B2 (en) * 2015-10-19 2020-05-19 Yeon Tae KIM Omni-channel marketing curation system based on big data
US10834055B2 (en) 2016-08-29 2020-11-10 Microsoft Technology Licensing, Llc. Cross-tenant data leakage isolation
CN108132838A (en) * 2016-11-30 2018-06-08 华为技术有限公司 A kind of method, apparatus and system of diagram data processing
US11256749B2 (en) 2016-11-30 2022-02-22 Huawei Technologies Co., Ltd. Graph data processing method and apparatus, and system
US20180307717A1 (en) * 2017-04-21 2018-10-25 Microsoft Technology Licensing, Llc Trigger System For Databases Using Proxy
US10558641B2 (en) * 2017-04-21 2020-02-11 Microsoft Technology Licensing, Llc Trigger system for databases using proxy
US11030247B2 (en) * 2017-12-29 2021-06-08 Electronic Arts Inc. Layered graph data structure

Also Published As

Publication number Publication date
US10496645B1 (en) 2019-12-03

Similar Documents

Publication Publication Date Title
US10496645B1 (en) System and method for analysis of a database proxy
US10725981B1 (en) Analyzing big data
US9361320B1 (en) Modeling big data
US9418101B2 (en) Query optimization
US10380187B2 (en) System, method, and recording medium for knowledge graph augmentation through schema extension
CN103620601A (en) Joining tables in a mapreduce procedure
US8560519B2 (en) Indexing and searching employing virtual documents
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
Sehgal et al. Sentiment analysis of big data applications using Twitter Data with the help of HADOOP framework
US10055482B2 (en) Knowledge engine for managing massive complex structured data
US9201700B2 (en) Provisioning computer resources on a network
WO2015074477A1 (en) Path analysis method and apparatus
US20200250192A1 (en) Processing queries associated with multiple file formats based on identified partition and data container objects
US11954086B2 (en) Index data structures and graphical user interface
US7925617B2 (en) Efficiency in processing queries directed to static data sets
CN111400301A (en) Data query method, device and equipment
CN104050264A (en) Method and device for generating SQL statement
US20210026889A1 (en) Accelerated large-scale similarity calculation
US11847121B2 (en) Compound predicate query statement transformation
CN115658680A (en) Data storage method, data query method and related device
CN108804502A (en) Big data inquiry system, method, computer equipment and storage medium
US11423098B2 (en) Method and apparatus to generate a simplified query when searching for catalog items
Arora et al. Big data technologies: brief overview
Hashem et al. A review of modeling toolbox for BigData
CN112667663A (en) Data query method and system

Legal Events

Date Code Title Description
AS Assignment

Owner name: SCALEBASE INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ASNER, NOAM;LEVARI, DORON;REEL/FRAME:032180/0822

Effective date: 20131017

AS Assignment

Owner name: SCALARC, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCALEBASE INC.;REEL/FRAME:036293/0450

Effective date: 20150809

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: IGNITE SCALARC SOLUTIONS, INC., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCALARC, INC.;REEL/FRAME:044925/0407

Effective date: 20180118