US20160140235A1 - Real-time distributed in memory search architecture - Google Patents
Real-time distributed in memory search architecture Download PDFInfo
- Publication number
- US20160140235A1 US20160140235A1 US14/920,202 US201514920202A US2016140235A1 US 20160140235 A1 US20160140235 A1 US 20160140235A1 US 201514920202 A US201514920202 A US 201514920202A US 2016140235 A1 US2016140235 A1 US 2016140235A1
- Authority
- US
- United States
- Prior art keywords
- search
- type
- communication link
- network segment
- manager
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G06F17/30864—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present disclosure relates in general to in-memory databases, and more specifically to hardware configurations of use in in-memory databases.
- a database is an organized collection of information stored as “records” having “fields” of information.
- a restaurant database may have a record for each restaurant in a region, where each record contains fields describing characteristics of the restaurant, such as name, address, type of cuisine, and the like).
- databases In operation, a database management system frequently needs to retrieve data from or persist data to storage devices such as disks. Unfortunately, access to such storage devices can be somewhat slow.
- databases typically employ a “cache” or “buffer cache” which is a section of relatively faster memory (e.g., random access memory (RAM)) allocated to store recently used data objects.
- Memory is typically provided on semiconductor or other electrical storage media and is coupled to a CPU (central processing unit) via a fast data bus which enables data maintained in memory to be accessed more rapidly than data stored on disks.
- One approach that may be taken when attempting to solve this problem is to store all the information in the database in memory, however as memory provided on computer systems has a limited size there are a number of obstacles that must be faced when attempting to handle databases of a larger scale. Some of these obstacles may include determining the technologies required to operate the database, including the networking needed, the hardware required for different nodes, and others.
- connection configurations for nodes of a system hosting an in-memory database having multiple connection bandwidth and latency tiers, where a first bandwidth tier may be associated with a bandwidth higher than a second bandwidth tier, the second bandwidth tier may be associated with a bandwidth higher than a third bandwidth tier, the third bandwidth tier may be associated with a bandwidth higher than a fourth bandwidth tier, and the first latency tier may be associated with a latency lower than the second latency tier.
- the system includes connection configurations having a suitable number of network segments, where network segments may be connected to a number of servers internal and external to the system, and to clusters of servers in the system.
- the servers of the system may include software modules such as search managers, analytics agents, search conductors, dependency managers, supervisors, and partitioners, amongst others.
- Servers and modules may be connected to the desired network segments to achieve desired bandwidth and latency needs.
- Servers and modules may be connected to the desired network segments to separate different classes of network traffic, to prevent one class of traffic from interfering with another.
- a system comprising one or more nodes hosting an in-memory database
- the system comprises a plurality of storage nodes comprising non-transitory machine-readable storage medium storing one or more partitions of a collection, wherein the collection stored by each respective storage node contains one or more records of a database, and wherein the storage medium of each respective storage node comprises main memory;
- a search manager node comprising a processor generating one or more search conductor queries using a search query received from a user node, transmitting the one or more search conductor queries to one or more search conductor nodes according to the search query, and forward one or more sets of search results to one or more analytics agent nodes according to the search query responsive to receive the one or more sets of search results;
- an analytics agent node comprising a processor executing one or more analytics algorithms responsive to receiving a set of search results from the search manager node;
- a search conductor node comprising a processor querying the collection of the database records of a storage node according to a search
- FIG. 1 is a connection diagram for a computing system hosting an in-memory database system, in which the nodes are logically clustered.
- Node refers to a computer hardware configuration suitable for running one or more modules.
- Cluster refers to a set of one or more nodes.
- Module refers to a computer software component suitable for carrying out one or more defined tasks.
- Selection refers to a discrete set of records.
- Record refers to one or more pieces of information that may be handled as a unit.
- Field refers to one data element within a record.
- Partition refers to an arbitrarily delimited portion of records of a collection.
- Search Manager refers to a module configured to at least receive one or more queries and return one or more search results.
- Analytics Agent refers to a module configured to at least receive one or more records, process said one or more records, and return the resulting one or more processed records.
- Search Conductor refers to a module configured to at least run one or more queries on a partition and return the search results to one or more search managers.
- Node Manager refers to a module configured to at least perform one or more commands on a node and communicate with one or more supervisors.
- Supervisor refers to a module configured to at least communicate with one or more components of a system and determine one or more statuses.
- Heartbeat refers to a signal communicating at least one or more statuses to one or more supervisors.
- Partitioner refers to a module configured to at least divide one or more collections into one or more partitions.
- Dependency Manager refers to a module configured to at least include one or more dependency trees associated with one or more modules, partitions, or suitable combinations, in a system; to at least receive a request for information relating to any one or more suitable portions of said one or more dependency trees; and to at least return one or more configurations derived from said portions.
- Database refers to any system including any combination of clusters and modules suitable for storing one or more collections and suitable to process one or more queries.
- Query refers to a request to retrieve information from one or more suitable databases.
- Memory refers to any hardware component suitable for storing information and retrieving said information at a sufficiently high speed.
- “Fragment” refers to separating records into smaller records until a desired level of granularity is achieved.
- the present disclosure describes hardware configurations that may be implemented in distributed computing systems hosting an in-memory database, sometimes referred to as “in-memory database architectures.”
- An in-memory database is a database storing data in records controlled by a database management system (DBMS) configured to store data records in a device's main memory, as opposed to conventional databases and DBMS modules that store data in “disk” memory.
- DBMS database management system
- Conventional disk storage requires processors (CPUs) to execute read and write commands to a device's hard disk, thus requiring CPUs to execute instructions to locate (i.e., seek) and retrieve the memory location for the data, before performing some type of operation with the data at that memory location.
- In-memory database systems access data that is placed into main memory, and then addressed accordingly, thereby mitigating the number of instructions performed by the CPUs and eliminating the seek time associated with CPUs seeking data on hard disk.
- In-memory databases may be implemented in a distributed computing architecture, which may be a computing system comprising one or more nodes configured to aggregate the nodes' respective resources (e.g., memory, disks, processors).
- a computing system hosting an in-memory database may distribute and store data records of the database among one or more nodes.
- these nodes are formed into “clusters” of nodes.
- these clusters of nodes store portions, or “collections,” of database information.
- suitable connection configurations may include a suitable number of network segments, where network segments may be connected to external servers, a first cluster including one or more search managers, a second cluster including one or more analytics agents, a third cluster including one or more search conductors, a fourth cluster including one or more dependency managers, a fifth cluster including one or more supervisors, and a sixth cluster including one or more partitioners.
- modules may be connected to the network segments using a desired bandwidth and latency tier.
- nodes of use in the first, second, third, fourth, fifth, and sixth cluster, as well as nodes may include hardware components suitable for running one or more types of modules.
- One or more suitable hardware components included in said clusters include CPUs, Memory, and Hard Disk, amongst others.
- the number of nodes used in a system hosting an in-memory database may be sufficiently high so as to allow the system hosting the in-memory database to operate at a desired capacity with a desired rate of growth, where the number of nodes in the system may allow the system to grow for a lead time required to acquire hardware.
- configurations disclosed here may reduce costs associated with deploying one or more suitable in-memory databases.
- FIG. 1 show Connection Diagram 100 having Line Type A 102 , Line Type B 104 , Line Type C 106 , Line type D 108 , First Network Segment 110 , Second Network Segment 112 , Third Network Segment 114 , First Search Manager 120 , nth Search Manager 122 , First Analytics Agent 130 , nth Analytics Agent 132 , First Search Conductor 140 , nth Search Conductor 142 , Partitioner 150 , First Dependency Manager 160 , nth Dependency Manager 162 , First Supervisor 170 , and nth Supervisor 172 .
- Line Type A 102 may represent a connection having a first bandwidth tier and a first latency tier
- Line Type B 104 may represent a connection having a second bandwidth tier and the first latency tier
- Line Type C 106 may represent a connection having a third bandwidth tier and a second latency tier
- Line Type D 108 may represent a connection having the fourth bandwidth tier and the second latency tier.
- the first bandwidth tier may be associated with a bandwidth higher than the second bandwidth tier
- the second bandwidth tier may be associated with a bandwidth higher than the third bandwidth tier
- the third bandwidth tier may be associated with a bandwidth higher than the fourth bandwidth tier
- the first latency tier may be associated with a latency lower than the second latency tier.
- a First Network Segment 110 may be connected to external servers using any suitable connection, including Line Type A 102 , Line Type B 104 , and Line Type C 106 .
- First Network Segment 110 may also be connected to a first cluster including a First Search Manager 120 and up to an nth Search Manager 122 using a Line Type A 102 connection.
- a Second Network Segment 112 may be connected to the first cluster including First Search Manager 120 and up to nth Search Manager 122 using a Line Type A 102 connection. Second Network Segment 112 may also be connected to a second cluster including a First Analytics Agent 130 and up to an nth Analytics Agent 132 a Line Type A 102 connection, a third cluster including a First Search Conductor 140 up to an nth Search Conductor 142 using a Line Type B 104 connection, a fourth cluster including a First Dependency Manager 160 up to nth Dependency Manager 162 using a Line Type D 108 connection, and a fifth cluster including a First Supervisor 170 up to nth Supervisor 172 using a Line Type D 108 connection.
- the bandwidth tier of Line Type A 102 may be sufficient for ensuring the first cluster including First Search Manager 120 and up to nth Search Manager 122 is able to at least receive an appropriate amount of information from a suitable number of search conductors in the third cluster including First Search Conductor 140 up to an nth Search Conductor 142 .
- the latency tier of Line Type A 102 may be sufficiently low so as to at least allow the system to be responsive enough to carry out a desired number of queries.
- the bandwidth tier of Line Type B 104 may be sufficient for ensuring search conductors in the third cluster including First Search Conductor 140 up to an nth Search Conductor 142 are able to at least return a desired size of results.
- the latency tier of Line Type B 104 may be sufficiently low so as to at least allow the system to be responsive enough to carry out a desired number of queries.
- the bandwidth tier of Line Type D 108 may be sufficient for ensuring dependency managers in the fourth cluster including First Dependency Manager 160 up to nth Dependency Manager 162 are able to at least receive a desired number of package requests and return a desired number of packagers. Additionally, the bandwidth tier of Line Type D 108 may be sufficient for ensuring supervisors in the fifth cluster including First Supervisor 170 up to nth Supervisor 172 are able to at least monitor and manage a desired number of nodes and modules. The latency tier of Line Type D 108 may be sufficiently low so as to at least allow the system to be managed in a desired period of time and to provide a desired monitoring frequency.
- a Third Network Segment 114 may be connected to the third cluster including a First Search Conductor 140 up to an nth Search Conductor 142 using a Line Type C 106 connection, the fourth cluster including a First Dependency Manager 160 up to nth Dependency Manager 162 using a Line Type D 108 connection, the fifth cluster including a First Supervisor 170 up to nth Supervisor 172 using a Line Type D 108 connection, and a sixth cluster including one or more Partitioners 150 using a Line Type C 106 connection.
- the bandwidth tier of Line Type B 104 may be sufficient for ensuring one or more Partitioners 150 are able to at least access a desired collection and output a desired number of partitions within a desired period of time. Additionally, the bandwidth tier of Line Type B 104 may be sufficient for ensuring the first cluster including First Search Manager 120 and up to nth Search Manager 122 is able to at least load a desired number of partitions within a desired period of time. The latency tier of Line Type B 104 may be sufficiently low so as to at least allow nodes using the connection to react to system commands within a desired period of time, and to allow the system to provide a desired monitoring frequency.
- the bandwidth tier of Line Type D 108 may be sufficient for ensuring dependency managers in the fourth cluster including First Dependency Manager 160 up to nth Dependency Manager 162 are able to at least receive a desired number of package requests and return a desired number of packagers. Additionally, the bandwidth tier of Line Type D 108 may be sufficient for ensuring supervisors in the fifth cluster including First Supervisor 170 up to nth Supervisor 172 are able to at least monitor and manage a desired number of nodes and modules. The latency tier of Line Type D 108 may be sufficiently low so as to allow the system to be managed in a desired period of time and to provide a desired monitoring frequency.
- the fifth cluster including First Supervisor 170 up to nth Supervisor 172 may have a Line Type D 108 connection to one or more node managers in any suitable number of nodes.
- additional clusters including one or more other types of modules may be connected to First Network Segment 110 , Second Network Segment 112 , and/or Third Network Segment 114 , where the connections may include Line Type A 102 , Line Type B 104 , Line Type C 106 , and/or Line type D 108 connections.
- nodes of use in the first, second, third, fourth, fifth, and sixth cluster, as well as nodes of use in other clusters not shown in FIG. 1 may include hardware components suitable for running one or more types of modules.
- nodes including a suitable number of Search Managers may require a CPU of a sufficiently high computation capacity so as to execute queries on an in-memory database within a desired period of time.
- nodes including a suitable number of Analytics Agents may require a CPU of a sufficiently high computation capacity so as to allow the system to process one or more desired analytics within a desired period of time.
- CPUs of use in nodes including said one or more Analytics Agents may be of a higher capacity than those used in nodes including a suitable number of Search Managers.
- nodes including a suitable number of Search Conductors may require a CPU of a sufficiently high computation capacity so as to allow the search conductors to execute search queries on the associated partition within a desired period of time, where the CPU capacity may be sufficient so as to achieve a desired level of idleness.
- nodes including a suitable number of Partitioners may require a CPU of a sufficiently high computation capacity so as to partition one or more desired collections within a desired period of time.
- the CPU may be of a sufficiently high computation capacity so as to allow partitioned data to be indexed and/or compressed within a desired period of time.
- nodes including a suitable number of Search Managers may require a memory capacity sufficiently high so as to execute queries on an in-memory database within a desired period of time, where the amount of memory may be sufficient to collate results from a suitable number of search conductors for a desired number of queries.
- nodes including a suitable number of Analytics Agents may require a require a memory capacity sufficiently high so as to allow the system to process one or more desired analytics within a desired period of time.
- memory capacities of use in nodes including said one or more Analytics Agents may be of a higher capacity than those used in nodes including a suitable number of Search Managers.
- nodes including a suitable number of Search Conductors may require memory capacity sufficiently high so as to allow the search conductors to load a desired amount of information into memory.
- the amount of memory required per search conductor may be proportional to:
- the amount of memory required per search conductor may not exceed the amount of memory that may be installed on the associated node.
- the amount of memory per node may also be adjusted according to the complexity of the queries to be executed by the search conductor. More complex queries may cause a load on the node's CPU sufficiently high so as to reduce the effectiveness of having a larger memory capacity. Less complex queries may cause a load on the node's CPU sufficiently light so as to allow a higher memory capacity to be used effectively.
- nodes including a suitable number of Partitioners may require a memory capacity sufficiently high so as to partition one or more desired collections within a desired period of time.
- the memory capacity may be of a sufficiently high capacity so as to allow partitioned data to be indexed and/or compressed within a desired period of time.
- nodes including a suitable number of Partitioners may require a hard disk of a speed sufficient to allow the Partitioners to process the data in within a desired period of time.
- the number of nodes used in a system hosting an in-memory database may be sufficiently high so as to allow the system hosting the in-memory database to operate at a desired capacity with a desired rate of growth, where the number of nodes in the system may allow the system to grow for a lead time required to acquire hardware.
- Example #1 is a system hosting an in-memory database with connections set up in a manner similar to that described in FIG. 1 .
- Search Managers, Search Conductors and Analytics Agents are all directly participating in the flow of an interactive user query. To minimize the latency of the user query, these modules are connected with the lowest latency connections.
- Search Managers and Analytics Agents work with the larger aggregated answer sets and benefit from the greatest bandwidth, where as the Search Conductors deal with the hundreds of partition based answer set components which require less bandwidth.
- Partitioners deal with large data volumes but at non-interactive speeds so they have both moderate latency and moderate bandwidth connections.
- Supervisors and Dependency managers are non-interactive and low data volume and so require lowest bandwidth and the highest latency connections. This configuration attempts to minimize cost based on actual need.
- Line Type A is an InfiniBand connection with a 40 Gb bandwidth and a latency of 1 microsecond or less
- Line Type B is an InfiniBand connection with a 20 Gb bandwidth and a latency of 1 microsecond or less
- Line Type C is a 10 Gb Ethernet connection
- Line Type D is a 100 Mb Ethernet connection.
- nodes including a search manager include CPUs able to operate at 2 Teraflops
- nodes including a search conductor include CPUs able to operate at 4 Teraflops
- nodes including an analytics agent include CPUs able to operate at 4 Teraflops
- nodes including a partitioner include CPUs able to operate at 6 Teraflops.
- nodes including a search conductor include 32 to 64 GB of RAM
- nodes including an analytics agent include 32 to 64 GB of RAM
- 6 nodes including a partitioner each include 64 GB of RAM and a 10,000 RPM hard disk.
- Example #2 is a system hosting in-memory database with connections set up in a manner similar to that described in FIG. 1 .
- Search Managers, Search Conductors and Analytics Agents are all directly participating in the flow of interactive user queries and data inserts.
- modules are connected using different network tiers.
- This configuration allows for responsive, interactive user queries by utilizing a low-latency network tier, such as InfiniBand, while also allowing high-volume data inserts utilizing a separate high-bandwidth network tier. Both types of operations run optimally without interfering with each other.
- Embodiments implemented in computer software may be implemented in software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
- a code segment or machine-executable instructions may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements.
- a code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents.
- Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
- the functions When implemented in software, the functions may be stored as one or more instructions or code on a non-transitory computer-readable or processor-readable storage medium.
- the steps of a method or algorithm disclosed herein may be embodied in a processor-executable software module which may reside on a computer-readable or processor-readable storage medium.
- a non-transitory computer-readable or processor-readable media includes both computer storage media and tangible storage media that facilitate transfer of a computer program from one place to another.
- a non-transitory processor-readable storage media may be any available media that may be accessed by a computer.
- non-transitory processor-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible storage medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer or processor.
- Disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a non-transitory processor-readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
Abstract
Disclosed here are distributed computing system connection configurations having multiple connection bandwidth and latency tiers. Also disclosed are connection configurations including a suitable number of network segments, where network segments may be connected to external servers and clusters including search managers, analytics agents, search conductors, dependency managers, supervisors, and partitioners, amongst others. In one or more embodiments, modules may be connected to the network segments using a desired bandwidth and latency tier. Disclosed here are hardware components suitable for running one or more types of modules on one or more suitable nodes. One or more suitable hardware components included in said clusters include CPUs, Memory, and Hard Disk, amongst others.
Description
- This application is a continuation of U.S. patent application Ser. No. 14/557,827, entitled “Real-Time Distributed In Memory Search Architecture,” filed Dec. 2, 2014, which is a non-provisional application that claims the benefit of U.S. Provisional Application No. 61/910,850, entitled “Real-Time Distributed In Memory Search Architecture,” filed on Dec. 2, 2013, each of which are hereby incorporated by reference in their entirety.
- This application is related to U.S. patent application Ser. No. 14/557,794, entitled “Method for Disambiguating Features in Unstructured Text,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/558,300, entitled “Event Detection Through Text Analysis Using Trained Event Template Models,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/557,807, entitled “Method for Facet Searching and Search Suggestions,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/558,254, entitled “Design and Implementation of Clustered In-Memory Database,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/557,951, entitled “Fault Tolerant Architecture for Distributed Computing Systems,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/558,009, entitled “Dependency Manager for Databases,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/558,055, entitled “Pluggable Architecture for Embedding Analytics in Clustered In-Memory Databases,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/558,101 “Non-Exclusionary Search Within In-Memory Databases,” filed Dec. 2, 2014; and U.S. patent application Ser. No. 14/557,900, entitled “Data Record Compression With Progressive and/or Selective Decompression,” filed Dec. 2, 2014; each of which are incorporated herein by reference in their entirety.
- The present disclosure relates in general to in-memory databases, and more specifically to hardware configurations of use in in-memory databases.
- Computers are powerful tools of use in storing and providing access to vast amounts of information, while databases are a common mechanism for storing information on computer systems while providing easy access to users. Typically, a database is an organized collection of information stored as “records” having “fields” of information. (e.g., a restaurant database may have a record for each restaurant in a region, where each record contains fields describing characteristics of the restaurant, such as name, address, type of cuisine, and the like).
- In operation, a database management system frequently needs to retrieve data from or persist data to storage devices such as disks. Unfortunately, access to such storage devices can be somewhat slow. To speed up access to data, databases typically employ a “cache” or “buffer cache” which is a section of relatively faster memory (e.g., random access memory (RAM)) allocated to store recently used data objects. Memory is typically provided on semiconductor or other electrical storage media and is coupled to a CPU (central processing unit) via a fast data bus which enables data maintained in memory to be accessed more rapidly than data stored on disks.
- One approach that may be taken when attempting to solve this problem is to store all the information in the database in memory, however as memory provided on computer systems has a limited size there are a number of obstacles that must be faced when attempting to handle databases of a larger scale. Some of these obstacles may include determining the technologies required to operate the database, including the networking needed, the hardware required for different nodes, and others.
- As such, there is a continuing need for improved methods of storing and retrieving data at high speeds at a large scale.
- Disclosed herein are connection configurations for nodes of a system hosting an in-memory database, the nodes having multiple connection bandwidth and latency tiers, where a first bandwidth tier may be associated with a bandwidth higher than a second bandwidth tier, the second bandwidth tier may be associated with a bandwidth higher than a third bandwidth tier, the third bandwidth tier may be associated with a bandwidth higher than a fourth bandwidth tier, and the first latency tier may be associated with a latency lower than the second latency tier.
- Disclosed herein is a distributed-computing system having multiple network segments, each with bandwidth and latency tiers applied to the distributed in-memory data platform. The system includes connection configurations having a suitable number of network segments, where network segments may be connected to a number of servers internal and external to the system, and to clusters of servers in the system. The servers of the system may include software modules such as search managers, analytics agents, search conductors, dependency managers, supervisors, and partitioners, amongst others. Servers and modules may be connected to the desired network segments to achieve desired bandwidth and latency needs. Servers and modules may be connected to the desired network segments to separate different classes of network traffic, to prevent one class of traffic from interfering with another.
- In one embodiment, a system comprising one or more nodes hosting an in-memory database, the system comprises a plurality of storage nodes comprising non-transitory machine-readable storage medium storing one or more partitions of a collection, wherein the collection stored by each respective storage node contains one or more records of a database, and wherein the storage medium of each respective storage node comprises main memory; a search manager node comprising a processor generating one or more search conductor queries using a search query received from a user node, transmitting the one or more search conductor queries to one or more search conductor nodes according to the search query, and forward one or more sets of search results to one or more analytics agent nodes according to the search query responsive to receive the one or more sets of search results; an analytics agent node comprising a processor executing one or more analytics algorithms responsive to receiving a set of search results from the search manager node; a search conductor node comprising a processor querying the collection of the database records of a storage node according to a search conductor query in response to receiving the search conductor query from the search manager, and transmitting the set of one or more search results to the search manager node in response to identifying the one or more search results of the set, wherein each respective search result corresponds to a database record containing data satisfying the search conductor query; and a plurality of network segments comprising one or more connections between one or more nodes communicating over each respective network segment, wherein a first network segment comprises the search manager, the search conductor, and the analytics agent.
- Numerous other aspects, features and benefits of the present disclosure may be made apparent from the following detailed description taken together with the drawing figures.
- The present disclosure can be better understood by referring to the following figures. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosure. In the figures, reference numerals designate corresponding parts throughout the different views.
-
FIG. 1 is a connection diagram for a computing system hosting an in-memory database system, in which the nodes are logically clustered. - As used here, the following terms may have the following definitions:
- “Node” refers to a computer hardware configuration suitable for running one or more modules.
- “Cluster” refers to a set of one or more nodes.
- “Module” refers to a computer software component suitable for carrying out one or more defined tasks.
- “Collection” refers to a discrete set of records.
- “Record” refers to one or more pieces of information that may be handled as a unit.
- “Field” refers to one data element within a record.
- “Partition” refers to an arbitrarily delimited portion of records of a collection.
- “Search Manager”, or “S.M.”, refers to a module configured to at least receive one or more queries and return one or more search results.
- “Analytics Agent”, “Analytics Module”, “A.A.”, or “A.M.”, refers to a module configured to at least receive one or more records, process said one or more records, and return the resulting one or more processed records.
- “Search Conductor”, or “S.C.”, refers to a module configured to at least run one or more queries on a partition and return the search results to one or more search managers.
- “Node Manager”, or “N.M.”, refers to a module configured to at least perform one or more commands on a node and communicate with one or more supervisors.
- “Supervisor” refers to a module configured to at least communicate with one or more components of a system and determine one or more statuses.
- “Heartbeat”, or “HB”, refers to a signal communicating at least one or more statuses to one or more supervisors.
- “Partitioner” refers to a module configured to at least divide one or more collections into one or more partitions.
- “Dependency Manager”, or “D.M.”, refers to a module configured to at least include one or more dependency trees associated with one or more modules, partitions, or suitable combinations, in a system; to at least receive a request for information relating to any one or more suitable portions of said one or more dependency trees; and to at least return one or more configurations derived from said portions.
- “Database” refers to any system including any combination of clusters and modules suitable for storing one or more collections and suitable to process one or more queries.
- “Query” refers to a request to retrieve information from one or more suitable databases.
- “Memory” refers to any hardware component suitable for storing information and retrieving said information at a sufficiently high speed.
- “Fragment” refers to separating records into smaller records until a desired level of granularity is achieved.
- The present disclosure is here described in detail with reference to embodiments illustrated in the drawings, which form a part here. Other embodiments may be used and/or other changes may be made without departing from the spirit or scope of the present disclosure. The illustrative embodiments described in the detailed description are not meant to be limiting of the subject matter presented here.
- The present disclosure describes hardware configurations that may be implemented in distributed computing systems hosting an in-memory database, sometimes referred to as “in-memory database architectures.”
- An in-memory database is a database storing data in records controlled by a database management system (DBMS) configured to store data records in a device's main memory, as opposed to conventional databases and DBMS modules that store data in “disk” memory. Conventional disk storage requires processors (CPUs) to execute read and write commands to a device's hard disk, thus requiring CPUs to execute instructions to locate (i.e., seek) and retrieve the memory location for the data, before performing some type of operation with the data at that memory location. In-memory database systems access data that is placed into main memory, and then addressed accordingly, thereby mitigating the number of instructions performed by the CPUs and eliminating the seek time associated with CPUs seeking data on hard disk.
- In-memory databases may be implemented in a distributed computing architecture, which may be a computing system comprising one or more nodes configured to aggregate the nodes' respective resources (e.g., memory, disks, processors). As disclosed herein, embodiments of a computing system hosting an in-memory database may distribute and store data records of the database among one or more nodes. In some embodiments, these nodes are formed into “clusters” of nodes. In some embodiments, these clusters of nodes store portions, or “collections,” of database information.
- In one or more embodiments, suitable connection configurations may include a suitable number of network segments, where network segments may be connected to external servers, a first cluster including one or more search managers, a second cluster including one or more analytics agents, a third cluster including one or more search conductors, a fourth cluster including one or more dependency managers, a fifth cluster including one or more supervisors, and a sixth cluster including one or more partitioners. In one or more embodiments, modules may be connected to the network segments using a desired bandwidth and latency tier.
- In one or more embodiments of the present disclosure, nodes of use in the first, second, third, fourth, fifth, and sixth cluster, as well as nodes, may include hardware components suitable for running one or more types of modules. One or more suitable hardware components included in said clusters include CPUs, Memory, and Hard Disk, amongst others.
- In one or more embodiments, the number of nodes used in a system hosting an in-memory database may be sufficiently high so as to allow the system hosting the in-memory database to operate at a desired capacity with a desired rate of growth, where the number of nodes in the system may allow the system to grow for a lead time required to acquire hardware.
- In one or more embodiments, configurations disclosed here may reduce costs associated with deploying one or more suitable in-memory databases.
- System Connections
-
FIG. 1 show Connection Diagram 100 havingLine Type A 102,Line Type B 104,Line Type C 106,Line type D 108,First Network Segment 110,Second Network Segment 112,Third Network Segment 114,First Search Manager 120,nth Search Manager 122,First Analytics Agent 130,nth Analytics Agent 132,First Search Conductor 140,nth Search Conductor 142,Partitioner 150,First Dependency Manager 160,nth Dependency Manager 162,First Supervisor 170, andnth Supervisor 172. - In one or more embodiments,
Line Type A 102 may represent a connection having a first bandwidth tier and a first latency tier,Line Type B 104 may represent a connection having a second bandwidth tier and the first latency tier,Line Type C 106 may represent a connection having a third bandwidth tier and a second latency tier, andLine Type D 108 may represent a connection having the fourth bandwidth tier and the second latency tier. In one or more embodiments the first bandwidth tier may be associated with a bandwidth higher than the second bandwidth tier, the second bandwidth tier may be associated with a bandwidth higher than the third bandwidth tier, the third bandwidth tier may be associated with a bandwidth higher than the fourth bandwidth tier, and the first latency tier may be associated with a latency lower than the second latency tier. - First Network Segment
- In one or more embodiments, a
First Network Segment 110 may be connected to external servers using any suitable connection, includingLine Type A 102,Line Type B 104, andLine Type C 106.First Network Segment 110 may also be connected to a first cluster including aFirst Search Manager 120 and up to annth Search Manager 122 using aLine Type A 102 connection. - Second Network Segment
- In one or more embodiments, a
Second Network Segment 112 may be connected to the first cluster includingFirst Search Manager 120 and up tonth Search Manager 122 using aLine Type A 102 connection.Second Network Segment 112 may also be connected to a second cluster including aFirst Analytics Agent 130 and up to an nth Analytics Agent 132 aLine Type A 102 connection, a third cluster including aFirst Search Conductor 140 up to annth Search Conductor 142 using aLine Type B 104 connection, a fourth cluster including aFirst Dependency Manager 160 up tonth Dependency Manager 162 using aLine Type D 108 connection, and a fifth cluster including aFirst Supervisor 170 up tonth Supervisor 172 using aLine Type D 108 connection. - In one or more embodiments, the bandwidth tier of
Line Type A 102 may be sufficient for ensuring the first cluster includingFirst Search Manager 120 and up tonth Search Manager 122 is able to at least receive an appropriate amount of information from a suitable number of search conductors in the third cluster includingFirst Search Conductor 140 up to annth Search Conductor 142. The latency tier ofLine Type A 102 may be sufficiently low so as to at least allow the system to be responsive enough to carry out a desired number of queries. - In one or more embodiments, the bandwidth tier of
Line Type B 104 may be sufficient for ensuring search conductors in the third cluster includingFirst Search Conductor 140 up to annth Search Conductor 142 are able to at least return a desired size of results. The latency tier ofLine Type B 104 may be sufficiently low so as to at least allow the system to be responsive enough to carry out a desired number of queries. - In one or more embodiments, the bandwidth tier of
Line Type D 108 may be sufficient for ensuring dependency managers in the fourth cluster includingFirst Dependency Manager 160 up tonth Dependency Manager 162 are able to at least receive a desired number of package requests and return a desired number of packagers. Additionally, the bandwidth tier ofLine Type D 108 may be sufficient for ensuring supervisors in the fifth cluster includingFirst Supervisor 170 up tonth Supervisor 172 are able to at least monitor and manage a desired number of nodes and modules. The latency tier ofLine Type D 108 may be sufficiently low so as to at least allow the system to be managed in a desired period of time and to provide a desired monitoring frequency. - Third Network Segment
- In one or more embodiments, a
Third Network Segment 114 may be connected to the third cluster including aFirst Search Conductor 140 up to annth Search Conductor 142 using aLine Type C 106 connection, the fourth cluster including aFirst Dependency Manager 160 up tonth Dependency Manager 162 using aLine Type D 108 connection, the fifth cluster including aFirst Supervisor 170 up tonth Supervisor 172 using aLine Type D 108 connection, and a sixth cluster including one or more Partitioners 150 using aLine Type C 106 connection. - In one or more embodiments, the bandwidth tier of
Line Type B 104 may be sufficient for ensuring one ormore Partitioners 150 are able to at least access a desired collection and output a desired number of partitions within a desired period of time. Additionally, the bandwidth tier ofLine Type B 104 may be sufficient for ensuring the first cluster includingFirst Search Manager 120 and up tonth Search Manager 122 is able to at least load a desired number of partitions within a desired period of time. The latency tier ofLine Type B 104 may be sufficiently low so as to at least allow nodes using the connection to react to system commands within a desired period of time, and to allow the system to provide a desired monitoring frequency. - In one or more embodiments, the bandwidth tier of
Line Type D 108 may be sufficient for ensuring dependency managers in the fourth cluster includingFirst Dependency Manager 160 up tonth Dependency Manager 162 are able to at least receive a desired number of package requests and return a desired number of packagers. Additionally, the bandwidth tier ofLine Type D 108 may be sufficient for ensuring supervisors in the fifth cluster includingFirst Supervisor 170 up tonth Supervisor 172 are able to at least monitor and manage a desired number of nodes and modules. The latency tier ofLine Type D 108 may be sufficiently low so as to allow the system to be managed in a desired period of time and to provide a desired monitoring frequency. - In one or more embodiments, the fifth cluster including
First Supervisor 170 up tonth Supervisor 172 may have aLine Type D 108 connection to one or more node managers in any suitable number of nodes. - In one or more other embodiments, additional clusters including one or more other types of modules may be connected to
First Network Segment 110,Second Network Segment 112, and/orThird Network Segment 114, where the connections may includeLine Type A 102,Line Type B 104,Line Type C 106, and/orLine type D 108 connections. - System Hardware
- In one or more embodiments of the present disclosure, nodes of use in the first, second, third, fourth, fifth, and sixth cluster, as well as nodes of use in other clusters not shown in
FIG. 1 , may include hardware components suitable for running one or more types of modules. - CPU Capacity
- In one or more embodiments, nodes including a suitable number of Search Managers may require a CPU of a sufficiently high computation capacity so as to execute queries on an in-memory database within a desired period of time.
- In one or more embodiments, nodes including a suitable number of Analytics Agents may require a CPU of a sufficiently high computation capacity so as to allow the system to process one or more desired analytics within a desired period of time. In one or more embodiments, CPUs of use in nodes including said one or more Analytics Agents may be of a higher capacity than those used in nodes including a suitable number of Search Managers.
- In one or more embodiments, nodes including a suitable number of Search Conductors may require a CPU of a sufficiently high computation capacity so as to allow the search conductors to execute search queries on the associated partition within a desired period of time, where the CPU capacity may be sufficient so as to achieve a desired level of idleness.
- In one or more embodiments, nodes including a suitable number of Partitioners may require a CPU of a sufficiently high computation capacity so as to partition one or more desired collections within a desired period of time. In one or more embodiments, the CPU may be of a sufficiently high computation capacity so as to allow partitioned data to be indexed and/or compressed within a desired period of time.
- Memory Capacity
- In one or more embodiments, nodes including a suitable number of Search Managers may require a memory capacity sufficiently high so as to execute queries on an in-memory database within a desired period of time, where the amount of memory may be sufficient to collate results from a suitable number of search conductors for a desired number of queries.
- In one or more embodiments, nodes including a suitable number of Analytics Agents may require a require a memory capacity sufficiently high so as to allow the system to process one or more desired analytics within a desired period of time. In one or more embodiments, memory capacities of use in nodes including said one or more Analytics Agents may be of a higher capacity than those used in nodes including a suitable number of Search Managers.
- In one or more embodiments, nodes including a suitable number of Search Conductors may require memory capacity sufficiently high so as to allow the search conductors to load a desired amount of information into memory. In one or more embodiments, the amount of memory required per search conductor may be proportional to:
-
- Where the amount of memory required per search conductor may not exceed the amount of memory that may be installed on the associated node. In one or more embodiments, the amount of memory per node may also be adjusted according to the complexity of the queries to be executed by the search conductor. More complex queries may cause a load on the node's CPU sufficiently high so as to reduce the effectiveness of having a larger memory capacity. Less complex queries may cause a load on the node's CPU sufficiently light so as to allow a higher memory capacity to be used effectively.
- In one or more embodiments, nodes including a suitable number of Partitioners may require a memory capacity sufficiently high so as to partition one or more desired collections within a desired period of time. In one or more embodiments, the memory capacity may be of a sufficiently high capacity so as to allow partitioned data to be indexed and/or compressed within a desired period of time.
- Hard Disk
- In one or more embodiments, nodes including a suitable number of Partitioners may require a hard disk of a speed sufficient to allow the Partitioners to process the data in within a desired period of time.
- Growth Rate
- In one or more embodiments, the number of nodes used in a system hosting an in-memory database may be sufficiently high so as to allow the system hosting the in-memory database to operate at a desired capacity with a desired rate of growth, where the number of nodes in the system may allow the system to grow for a lead time required to acquire hardware.
- Example #1 is a system hosting an in-memory database with connections set up in a manner similar to that described in
FIG. 1 . Search Managers, Search Conductors and Analytics Agents are all directly participating in the flow of an interactive user query. To minimize the latency of the user query, these modules are connected with the lowest latency connections. Search Managers and Analytics Agents work with the larger aggregated answer sets and benefit from the greatest bandwidth, where as the Search Conductors deal with the hundreds of partition based answer set components which require less bandwidth. Partitioners deal with large data volumes but at non-interactive speeds so they have both moderate latency and moderate bandwidth connections. Supervisors and Dependency managers are non-interactive and low data volume and so require lowest bandwidth and the highest latency connections. This configuration attempts to minimize cost based on actual need. - Therefore, in this example, Line Type A is an InfiniBand connection with a 40 Gb bandwidth and a latency of 1 microsecond or less; Line Type B is an InfiniBand connection with a 20 Gb bandwidth and a latency of 1 microsecond or less; Line Type C is a 10 Gb Ethernet connection; and Line Type D is a 100 Mb Ethernet connection. In this example, nodes including a search manager include CPUs able to operate at 2 Teraflops; nodes including a search conductor include CPUs able to operate at 4 Teraflops; nodes including an analytics agent include CPUs able to operate at 4 Teraflops; and nodes including a partitioner include CPUs able to operate at 6 Teraflops. In this example, nodes including a search conductor include 32 to 64 GB of RAM, nodes including an analytics agent include 32 to 64 GB of RAM, and 6 nodes including a partitioner each include 64 GB of RAM and a 10,000 RPM hard disk.
- Example #2 is a system hosting in-memory database with connections set up in a manner similar to that described in
FIG. 1 . Search Managers, Search Conductors and Analytics Agents are all directly participating in the flow of interactive user queries and data inserts. To separate high-volume, backend data insert network traffic from interactive, low-latency user queries, modules are connected using different network tiers. This configuration allows for responsive, interactive user queries by utilizing a low-latency network tier, such as InfiniBand, while also allowing high-volume data inserts utilizing a separate high-bandwidth network tier. Both types of operations run optimally without interfering with each other. - The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
- Embodiments implemented in computer software may be implemented in software, firmware, middleware, microcode, hardware description languages, or any combination thereof. A code segment or machine-executable instructions may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
- The actual software code or specialized control hardware used to implement these systems and methods is not limiting of the invention. Thus, the operation and behavior of the systems and methods were described without reference to the specific software code being understood that software and control hardware can be designed to implement the systems and methods based on the description herein.
- When implemented in software, the functions may be stored as one or more instructions or code on a non-transitory computer-readable or processor-readable storage medium. The steps of a method or algorithm disclosed herein may be embodied in a processor-executable software module which may reside on a computer-readable or processor-readable storage medium. A non-transitory computer-readable or processor-readable media includes both computer storage media and tangible storage media that facilitate transfer of a computer program from one place to another. A non-transitory processor-readable storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such non-transitory processor-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible storage medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer or processor. Disk and disc, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a non-transitory processor-readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
- The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.
Claims (20)
1. A system comprising:
a node hosting an in-memory database, wherein the in-memory database comprises:
a search manager in communication with a first network segment and a second network segment via a first type of communication link;
an analytics agent in communication with the second network segment via the first type of communication link;
a search conductor in communication with the second network segment via a second type of communication link and a third network segment via a third type of communication link, wherein the first type of communication link is of higher bandwidth tier than the second type of communication link, wherein the second type of communication link is of higher bandwidth tier than the third type of communication link, wherein the first type of communication link and the second type of communication link are of a first latency tier;
a partitioner in communication with the third network segment via the third type of communication link;
a dependency manager in communication with the second network segment and the third network segment via a fourth type of communication link, wherein the third type of communication link is of higher bandwidth tier than the fourth type of communication link, wherein the third type of communication link and the fourth type of communication link are of a second latency tier, wherein the first latency tier is of latency lower than the second latency tier; and
a supervisor in communication with the second network segment and the third network segment via the fourth type of communication link.
2. The system of claim 1 , wherein the first network segment, the second network segment, the third network segment, the search manager, the analytics agent, the search conductor, the partitioner, the dependency manager, and the supervisor define a network which communicates with a server outside of the network via at least one of the first type of communication link, the second type of communication link, or the third type of communication link.
3. The system of claim 1 , wherein the search manager is configured to generate a search conductor query based on a user query received from a user computing device, to send the search conductor query to the search conductor, and to send a search result to the analytics agent based on a receipt of the search result from the search conductor, wherein the search result is based on the search conductor query.
4. The system of claim 3 , wherein the search conductor is configured to query a collection containing a record of the in-memory database based on the search conductor query received from the search manager and to send the search result to the search manager based on an identification of the search result, wherein the record is stored at a storage node comprising a main memory, wherein the search result contains data which satisfies the search conductor query.
5. The system of claim 4 , wherein the partitioner is configured to partition the collection and to distribute the collection onto at least the storage node based on a schema file and a receipt of the collection.
6. The system of claim 3 , wherein the analytics agent is configured to perform an analytics algorithm based on a receipt of the search result from the search manager.
7. The system of claim 1 , wherein the search manager, the search conductor, and the analytics agent define a cluster.
8. The system of claim 1 , wherein the supervisor is configured to perform node status monitoring periodically and to send a configuration package file to a failed node based on a receipt of a status indicating a failed resource of the failed node.
9. The system of claim 1 , wherein the dependency manager stores a configuration package file and a dependency tree, wherein the dependency manager is configured to determine the configuration package file for a failed node based on the dependency tree and to send the configuration package file to the supervisor based on a receipt of a request for the configuration package file from the supervisor.
10. The system of claim 9 , wherein the supervisor sends the request for the configuration package file based on the supervisor receiving a message informative of the failed node.
11. A method comprising:
configuring, by a computer, a search manager to be in communication with a first network segment and a second network segment via a first type of communication link;
configuring, by the computer, an analytics agent to be in communication with the second network segment via the first type of communication link;
configuring, by the computer, a search conductor to be in communication with the second network segment via a second type of communication link and a third network segment via a third type of communication link, wherein the first type of communication link is of higher bandwidth tier than the second type of communication link, wherein the second type of communication link is of higher bandwidth tier than the third type of communication link, wherein the first type of communication link and the second type of communication link are of a first latency tier;
configuring, by the computer, a partitioner to be in communication with the third network segment via the third type of communication link;
configuring, by the computer, a dependency manager to be in communication with the second network segment and the third network segment via a fourth type of communication link, wherein the third type of communication link is of higher bandwidth tier than the fourth type of communication link, wherein the third type of communication link and the fourth type of communication link are of a second latency tier, wherein the first latency tier is of latency lower than the second latency tier;
configuring, by the computer, a supervisor to be in communication with the second network segment and the third network segment via the fourth type of communication link, wherein a node hosts an in-memory database, wherein the in-memory database comprises the search manager, the analytics agent, the search conductor, the partitioner, the dependency manager, and the supervisor.
12. The method of claim 11 , wherein the first network segment, the second network segment, the third network segment, the search manager, the analytics agent, the search conductor, the partitioner, the dependency manager, and the supervisor define a network which communicates with a server outside of the network via at least one of the first type of communication link, the second type of communication link, or the third type of communication link.
13. The method of claim 11 , further comprising:
generating, by the search manager, a search conductor query based on a user query received from a user computing device;
sending, by the search manager, the search conductor query to the search conductor;
sending a search result to the analytics agent based on a receipt of the search result from the search conductor, wherein the search result is based on the search conductor query.
14. The method of claim 13 , further comprising:
querying, by the search conductor, a collection containing a record of the in-memory database based on the search conductor query received from the search manager;
sending, by the search conductor, the search result to the search manager based on an identification of the search result, wherein the record is stored at a storage node comprising a main memory, wherein the search result contains data which satisfies the search conductor query.
15. The method of claim 14 , further comprising:
partitioning, by the partitioner, the collection;
distributing, by the partitioner, the collection onto at least the storage node based on a schema file and a receipt of the collection.
16. The method of claim 13 , further comprising:
performing, by the analytics agent, an analytics algorithm based on a receipt of the search result from the search manager.
17. The method of claim 11 , wherein the search manager, the search conductor, and the analytics agent define a cluster.
18. The method of claim 11 , further comprising:
performing, by the supervisor, node status monitoring periodically;
sending, by the supervisor, a configuration package file to a failed node based on a receipt of a status indicating a failed resource of the failed node.
19. The method of claim 11 , further comprising:
storing, by the dependency manager, a configuration package file and a dependency tree;
determining, by the dependency manager, the configuration package file for a failed node based on the dependency tree;
sending, by the dependency manager, the configuration package file to the supervisor based on a receipt of a request for the configuration package file from the supervisor.
20. The system of claim 19 , further comprising:
sending, by the supervisor, the request for the configuration package file based on the supervisor receiving a message informative of the failed node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/920,202 US20160140235A1 (en) | 2013-12-02 | 2015-10-22 | Real-time distributed in memory search architecture |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361910850P | 2013-12-02 | 2013-12-02 | |
US14/557,827 US9223875B2 (en) | 2013-12-02 | 2014-12-02 | Real-time distributed in memory search architecture |
US14/920,202 US20160140235A1 (en) | 2013-12-02 | 2015-10-22 | Real-time distributed in memory search architecture |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/557,827 Continuation US9223875B2 (en) | 2013-12-02 | 2014-12-02 | Real-time distributed in memory search architecture |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160140235A1 true US20160140235A1 (en) | 2016-05-19 |
Family
ID=53265539
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/557,827 Active US9223875B2 (en) | 2013-12-02 | 2014-12-02 | Real-time distributed in memory search architecture |
US14/920,202 Abandoned US20160140235A1 (en) | 2013-12-02 | 2015-10-22 | Real-time distributed in memory search architecture |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/557,827 Active US9223875B2 (en) | 2013-12-02 | 2014-12-02 | Real-time distributed in memory search architecture |
Country Status (1)
Country | Link |
---|---|
US (2) | US9223875B2 (en) |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9424524B2 (en) | 2013-12-02 | 2016-08-23 | Qbase, LLC | Extracting facts from unstructured text |
US9223833B2 (en) | 2013-12-02 | 2015-12-29 | Qbase, LLC | Method for in-loop human validation of disambiguated features |
US9659108B2 (en) | 2013-12-02 | 2017-05-23 | Qbase, LLC | Pluggable architecture for embedding analytics in clustered in-memory databases |
US9177262B2 (en) | 2013-12-02 | 2015-11-03 | Qbase, LLC | Method of automated discovery of new topics |
US9430547B2 (en) | 2013-12-02 | 2016-08-30 | Qbase, LLC | Implementation of clustered in-memory database |
US9317565B2 (en) | 2013-12-02 | 2016-04-19 | Qbase, LLC | Alerting system based on newly disambiguated features |
US9230041B2 (en) | 2013-12-02 | 2016-01-05 | Qbase, LLC | Search suggestions of related entities based on co-occurrence and/or fuzzy-score matching |
US9336280B2 (en) | 2013-12-02 | 2016-05-10 | Qbase, LLC | Method for entity-driven alerts based on disambiguated features |
US9547701B2 (en) | 2013-12-02 | 2017-01-17 | Qbase, LLC | Method of discovering and exploring feature knowledge |
US9201744B2 (en) | 2013-12-02 | 2015-12-01 | Qbase, LLC | Fault tolerant architecture for distributed computing systems |
US9619571B2 (en) | 2013-12-02 | 2017-04-11 | Qbase, LLC | Method for searching related entities through entity co-occurrence |
US9984427B2 (en) | 2013-12-02 | 2018-05-29 | Qbase, LLC | Data ingestion module for event detection and increased situational awareness |
US9424294B2 (en) | 2013-12-02 | 2016-08-23 | Qbase, LLC | Method for facet searching and search suggestions |
US9542477B2 (en) | 2013-12-02 | 2017-01-10 | Qbase, LLC | Method of automated discovery of topics relatedness |
US9223875B2 (en) | 2013-12-02 | 2015-12-29 | Qbase, LLC | Real-time distributed in memory search architecture |
US9355152B2 (en) | 2013-12-02 | 2016-05-31 | Qbase, LLC | Non-exclusionary search within in-memory databases |
CN106164890A (en) | 2013-12-02 | 2016-11-23 | 丘贝斯有限责任公司 | For the method eliminating the ambiguity of the feature in non-structured text |
US9208204B2 (en) | 2013-12-02 | 2015-12-08 | Qbase, LLC | Search suggestions using fuzzy-score matching and entity co-occurrence |
US9922032B2 (en) | 2013-12-02 | 2018-03-20 | Qbase, LLC | Featured co-occurrence knowledge base from a corpus of documents |
US9348573B2 (en) * | 2013-12-02 | 2016-05-24 | Qbase, LLC | Installation and fault handling in a distributed system utilizing supervisor and dependency manager nodes |
US9544361B2 (en) | 2013-12-02 | 2017-01-10 | Qbase, LLC | Event detection through text analysis using dynamic self evolving/learning module |
Family Cites Families (92)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2343097A (en) * | 1996-03-21 | 1997-10-10 | Mpath Interactive, Inc. | Network match maker for selecting clients based on attributes of servers and communication links |
US6178529B1 (en) | 1997-11-03 | 2001-01-23 | Microsoft Corporation | Method and system for resource monitoring of disparate resources in a server cluster |
US6353926B1 (en) | 1998-07-15 | 2002-03-05 | Microsoft Corporation | Software update notification |
US6266781B1 (en) | 1998-07-20 | 2001-07-24 | Academia Sinica | Method and apparatus for providing failure detection and recovery with predetermined replication style for distributed applications in a network |
US6338092B1 (en) * | 1998-09-24 | 2002-01-08 | International Business Machines Corporation | Method, system and computer program for replicating data in a distributed computed environment |
US6959300B1 (en) | 1998-12-10 | 2005-10-25 | At&T Corp. | Data compression method and apparatus |
US7099898B1 (en) | 1999-08-12 | 2006-08-29 | International Business Machines Corporation | Data access system |
US6738759B1 (en) | 2000-07-07 | 2004-05-18 | Infoglide Corporation, Inc. | System and method for performing similarity searching using pointer optimization |
US8692695B2 (en) | 2000-10-03 | 2014-04-08 | Realtime Data, Llc | Methods for encoding and decoding data |
US6832373B2 (en) | 2000-11-17 | 2004-12-14 | Bitfone Corporation | System and method for updating and distributing information |
US6691109B2 (en) | 2001-03-22 | 2004-02-10 | Turbo Worx, Inc. | Method and apparatus for high-performance sequence comparison |
GB2374687A (en) | 2001-04-19 | 2002-10-23 | Ibm | Managing configuration changes in a data processing system |
US7082478B2 (en) | 2001-05-02 | 2006-07-25 | Microsoft Corporation | Logical semantic compression |
US6961723B2 (en) * | 2001-05-04 | 2005-11-01 | Sun Microsystems, Inc. | System and method for determining relevancy of query responses in a distributed network search mechanism |
US20030028869A1 (en) | 2001-08-02 | 2003-02-06 | Drake Daniel R. | Method and computer program product for integrating non-redistributable software applications in a customer driven installable package |
US6954456B2 (en) * | 2001-12-14 | 2005-10-11 | At & T Corp. | Method for content-aware redirection and content renaming |
US6829606B2 (en) | 2002-02-14 | 2004-12-07 | Infoglide Software Corporation | Similarity search engine for use with relational databases |
US7421478B1 (en) | 2002-03-07 | 2008-09-02 | Cisco Technology, Inc. | Method and apparatus for exchanging heartbeat messages and configuration information between nodes operating in a master-slave configuration |
US8015143B2 (en) | 2002-05-22 | 2011-09-06 | Estes Timothy W | Knowledge discovery agent system and method |
US7570262B2 (en) | 2002-08-08 | 2009-08-04 | Reuters Limited | Method and system for displaying time-series data and correlated events derived from text mining |
US7249312B2 (en) | 2002-09-11 | 2007-07-24 | Intelligent Results | Attribute scoring for unstructured content |
US7058846B1 (en) | 2002-10-17 | 2006-06-06 | Veritas Operating Corporation | Cluster failover for storage management services |
US20040205064A1 (en) | 2003-04-11 | 2004-10-14 | Nianjun Zhou | Adaptive search employing entropy based quantitative information measurement |
US7543174B1 (en) | 2003-09-24 | 2009-06-02 | Symantec Operating Corporation | Providing high availability for an application by rapidly provisioning a node and failing over to the node |
US9009153B2 (en) | 2004-03-31 | 2015-04-14 | Google Inc. | Systems and methods for identifying a named entity |
US7818615B2 (en) | 2004-09-16 | 2010-10-19 | Invensys Systems, Inc. | Runtime failure management of redundantly deployed hosts of a supervisory process control data acquisition facility |
US7403945B2 (en) * | 2004-11-01 | 2008-07-22 | Sybase, Inc. | Distributed database system providing data and space management methodology |
US20060179026A1 (en) | 2005-02-04 | 2006-08-10 | Bechtel Michael E | Knowledge discovery tool extraction and integration |
US20060294071A1 (en) | 2005-06-28 | 2006-12-28 | Microsoft Corporation | Facet extraction and user feedback for ranking improvement and personalization |
US7630977B2 (en) | 2005-06-29 | 2009-12-08 | Xerox Corporation | Categorization including dependencies between different category systems |
US8386463B2 (en) | 2005-07-14 | 2013-02-26 | International Business Machines Corporation | Method and apparatus for dynamically associating different query execution strategies with selective portions of a database table |
US7681075B2 (en) | 2006-05-02 | 2010-03-16 | Open Invention Network Llc | Method and system for providing high availability to distributed computer applications |
US7447940B2 (en) | 2005-11-15 | 2008-11-04 | Bea Systems, Inc. | System and method for providing singleton services in a cluster |
US8341622B1 (en) | 2005-12-15 | 2012-12-25 | Crimson Corporation | Systems and methods for efficiently using network bandwidth to deploy dependencies of a software package |
US7899871B1 (en) | 2006-01-23 | 2011-03-01 | Clearwell Systems, Inc. | Methods and systems for e-mail topic classification |
US7519613B2 (en) | 2006-02-28 | 2009-04-14 | International Business Machines Corporation | Method and system for generating threads of documents |
US8726267B2 (en) | 2006-03-24 | 2014-05-13 | Red Hat, Inc. | Sharing software certification and process metadata |
US8190742B2 (en) | 2006-04-25 | 2012-05-29 | Hewlett-Packard Development Company, L.P. | Distributed differential store with non-distributed objects and compression-enhancing data-object routing |
US20070282959A1 (en) | 2006-06-02 | 2007-12-06 | Stern Donald S | Message push with pull of information to a communications computing device |
US8615800B2 (en) | 2006-07-10 | 2013-12-24 | Websense, Inc. | System and method for analyzing web content |
US7624118B2 (en) | 2006-07-26 | 2009-11-24 | Microsoft Corporation | Data processing over very large databases |
US8122026B1 (en) | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
US7853611B2 (en) | 2007-02-26 | 2010-12-14 | International Business Machines Corporation | System and method for deriving a hierarchical event based database having action triggers based on inferred probabilities |
US8352455B2 (en) | 2007-06-29 | 2013-01-08 | Allvoices, Inc. | Processing a content item with regard to an event and a location |
US20090043792A1 (en) | 2007-08-07 | 2009-02-12 | Eric Lawrence Barsness | Partial Compression of a Database Table Based on Historical Information |
US10762080B2 (en) | 2007-08-14 | 2020-09-01 | John Nicholas and Kristin Gross Trust | Temporal document sorter and method |
GB2453174B (en) | 2007-09-28 | 2011-12-07 | Advanced Risc Mach Ltd | Techniques for generating a trace stream for a data processing apparatus |
KR100898339B1 (en) | 2007-10-05 | 2009-05-20 | 한국전자통신연구원 | Autonomous fault processing system in home network environments and operation method thereof |
US8396838B2 (en) | 2007-10-17 | 2013-03-12 | Commvault Systems, Inc. | Legal compliance, electronic discovery and electronic document handling of online and offline copies of data |
US8375073B1 (en) | 2007-11-12 | 2013-02-12 | Google Inc. | Identification and ranking of news stories of interest |
US8294763B2 (en) | 2007-12-14 | 2012-10-23 | Sri International | Method for building and extracting entity networks from video |
US8326847B2 (en) | 2008-03-22 | 2012-12-04 | International Business Machines Corporation | Graph search system and method for querying loosely integrated data |
US20100077001A1 (en) | 2008-03-27 | 2010-03-25 | Claude Vogel | Search system and method for serendipitous discoveries with faceted full-text classification |
US8712926B2 (en) | 2008-05-23 | 2014-04-29 | International Business Machines Corporation | Using rule induction to identify emerging trends in unstructured text streams |
US8358308B2 (en) | 2008-06-27 | 2013-01-22 | Microsoft Corporation | Using visual techniques to manipulate data |
US8171547B2 (en) | 2008-12-03 | 2012-05-01 | Trend Micro Incorporated | Method and system for real time classification of events in computer integrity system |
US8874576B2 (en) | 2009-02-27 | 2014-10-28 | Microsoft Corporation | Reporting including filling data gaps and handling uncategorized data |
US20100235311A1 (en) | 2009-03-13 | 2010-09-16 | Microsoft Corporation | Question and answer search |
US8213725B2 (en) | 2009-03-20 | 2012-07-03 | Eastman Kodak Company | Semantic event detection using cross-domain knowledge |
US8161048B2 (en) | 2009-04-24 | 2012-04-17 | At&T Intellectual Property I, L.P. | Database analysis using clusters |
US8055933B2 (en) | 2009-07-21 | 2011-11-08 | International Business Machines Corporation | Dynamic updating of failover policies for increased application availability |
WO2011046560A1 (en) | 2009-10-15 | 2011-04-21 | Hewlett-Packard Development Company, L.P. | Heterogeneous data source management |
US8645372B2 (en) | 2009-10-30 | 2014-02-04 | Evri, Inc. | Keyword-based search engine results using enhanced query strategies |
US20110125764A1 (en) | 2009-11-26 | 2011-05-26 | International Business Machines Corporation | Method and system for improved query expansion in faceted search |
WO2011092793A1 (en) | 2010-01-29 | 2011-08-04 | パナソニック株式会社 | Data processing device |
US9710556B2 (en) | 2010-03-01 | 2017-07-18 | Vcvc Iii Llc | Content recommendation based on collections of entities |
US8595234B2 (en) | 2010-05-17 | 2013-11-26 | Wal-Mart Stores, Inc. | Processing data feeds |
US8429256B2 (en) | 2010-05-28 | 2013-04-23 | Red Hat, Inc. | Systems and methods for generating cached representations of host package inventories in remote package repositories |
US8345998B2 (en) | 2010-08-10 | 2013-01-01 | Xerox Corporation | Compression scheme selection based on image data type and user selections |
US8321443B2 (en) | 2010-09-07 | 2012-11-27 | International Business Machines Corporation | Proxying open database connectivity (ODBC) calls |
US20120102121A1 (en) | 2010-10-25 | 2012-04-26 | Yahoo! Inc. | System and method for providing topic cluster based updates |
US8423522B2 (en) | 2011-01-04 | 2013-04-16 | International Business Machines Corporation | Query-aware compression of join results |
US20120246154A1 (en) | 2011-03-23 | 2012-09-27 | International Business Machines Corporation | Aggregating search results based on associating data instances with knowledge base entities |
KR20120134916A (en) | 2011-06-03 | 2012-12-12 | 삼성전자주식회사 | Storage device and data processing device for storage device |
US20120310934A1 (en) | 2011-06-03 | 2012-12-06 | Thomas Peh | Historic View on Column Tables Using a History Table |
US9104979B2 (en) | 2011-06-16 | 2015-08-11 | Microsoft Technology Licensing, Llc | Entity recognition using probabilities for out-of-collection data |
WO2013003770A2 (en) | 2011-06-30 | 2013-01-03 | Openwave Mobility Inc. | Database compression system and method |
US9032387B1 (en) | 2011-10-04 | 2015-05-12 | Amazon Technologies, Inc. | Software distribution framework |
US9026480B2 (en) | 2011-12-21 | 2015-05-05 | Telenav, Inc. | Navigation system with point of interest classification mechanism and method of operation thereof |
US9037579B2 (en) | 2011-12-27 | 2015-05-19 | Business Objects Software Ltd. | Generating dynamic hierarchical facets from business intelligence artifacts |
US10908792B2 (en) | 2012-04-04 | 2021-02-02 | Recorded Future, Inc. | Interactive event-based information system |
US20130290232A1 (en) | 2012-04-30 | 2013-10-31 | Mikalai Tsytsarau | Identifying news events that cause a shift in sentiment |
US8948789B2 (en) | 2012-05-08 | 2015-02-03 | Qualcomm Incorporated | Inferring a context from crowd-sourced activity data |
US9703833B2 (en) | 2012-11-30 | 2017-07-11 | Sap Se | Unification of search and analytics |
US9542652B2 (en) | 2013-02-28 | 2017-01-10 | Microsoft Technology Licensing, Llc | Posterior probability pursuit for entity disambiguation |
US9104710B2 (en) | 2013-03-15 | 2015-08-11 | Src, Inc. | Method for cross-domain feature correlation |
US8977600B2 (en) | 2013-05-24 | 2015-03-10 | Software AG USA Inc. | System and method for continuous analytics run against a combination of static and real-time data |
US9734221B2 (en) | 2013-09-12 | 2017-08-15 | Sap Se | In memory database warehouse |
US9223875B2 (en) | 2013-12-02 | 2015-12-29 | Qbase, LLC | Real-time distributed in memory search architecture |
US9201744B2 (en) | 2013-12-02 | 2015-12-01 | Qbase, LLC | Fault tolerant architecture for distributed computing systems |
US9424294B2 (en) | 2013-12-02 | 2016-08-23 | Qbase, LLC | Method for facet searching and search suggestions |
US9025892B1 (en) | 2013-12-02 | 2015-05-05 | Qbase, LLC | Data record compression with progressive and/or selective decomposition |
-
2014
- 2014-12-02 US US14/557,827 patent/US9223875B2/en active Active
-
2015
- 2015-10-22 US US14/920,202 patent/US20160140235A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
US20150154297A1 (en) | 2015-06-04 |
US9223875B2 (en) | 2015-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9223875B2 (en) | Real-time distributed in memory search architecture | |
US11888702B2 (en) | Intelligent analytic cloud provisioning | |
US10922316B2 (en) | Using computing resources to perform database queries according to a dynamically determined query size | |
US10713247B2 (en) | Executing queries for structured data and not-structured data | |
US11868359B2 (en) | Dynamically assigning queries to secondary query processing resources | |
US9430547B2 (en) | Implementation of clustered in-memory database | |
US20200050694A1 (en) | Burst Performance of Database Queries According to Query Size | |
US10169409B2 (en) | System and method for transferring data between RDBMS and big data platform | |
US9256633B2 (en) | Partitioning data for parallel processing | |
US11496588B2 (en) | Clustering layers in multi-node clusters | |
US9659108B2 (en) | Pluggable architecture for embedding analytics in clustered in-memory databases | |
US10877810B2 (en) | Object storage system with metadata operation priority processing | |
US11727004B2 (en) | Context dependent execution time prediction for redirecting queries | |
US20180004797A1 (en) | Application resiliency management using a database driver | |
CN114090580A (en) | Data processing method, device, equipment, storage medium and product | |
US11537616B1 (en) | Predicting query performance for prioritizing query execution | |
US8037184B2 (en) | Query governor with network monitoring in a parallel computer system | |
US11762860B1 (en) | Dynamic concurrency level management for database queries | |
US11841857B2 (en) | Query efficiency using merged columns | |
CN113785286B (en) | Querying data in a distributed storage system | |
US20230113301A1 (en) | Managing queries to non-relational databases with multiple paths to storage system | |
CN115687441A (en) | Data analysis method, device and system | |
성민영 | A Machine Learning-based Methodology to Detect I/O Performance Bottlenecks for Hadoop Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QBASE, LLC, VIRGINIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIGHTNER, SCOTT;WECKESSER, FRANZ;REEL/FRAME:036956/0893 Effective date: 20141201 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |