EP4473418A1 - Optimierung der zuweisung von computerressourcen für virtuelles lagerhaus - Google Patents

Optimierung der zuweisung von computerressourcen für virtuelles lagerhaus

Info

Publication number
EP4473418A1
EP4473418A1 EP23706193.2A EP23706193A EP4473418A1 EP 4473418 A1 EP4473418 A1 EP 4473418A1 EP 23706193 A EP23706193 A EP 23706193A EP 4473418 A1 EP4473418 A1 EP 4473418A1
Authority
EP
European Patent Office
Prior art keywords
virtual warehouse
computing device
virtual
warehouse
different
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP23706193.2A
Other languages
English (en)
French (fr)
Inventor
Syed Shamaz Salim
Ganesh Bharathan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital One Services LLC
Original Assignee
Capital One Services LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/590,358 external-priority patent/US20230244687A1/en
Priority claimed from US17/590,372 external-priority patent/US12236278B2/en
Application filed by Capital One Services LLC filed Critical Capital One Services LLC
Publication of EP4473418A1 publication Critical patent/EP4473418A1/de
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning

Definitions

  • aspects of the disclosure relate generally to data storage and retrieval. More specifically, aspects of the disclosure relate to managing virtual warehouses which execute queries with respect to a plurality of data warehouses
  • Snowflake architecture produced by Snowflake Inc. of San Mateo, CA, permits organizations to logically separate but natively integrate storage, computing, and services. Given the complexity and size of many data warehouses, the task of executing queries and collecting the results of those queries is often tasked to computing devices specially configured for that purpose. Such computing devices may be, as is the case with Snowflake, one or more servers which may instantiate virtual warehouses for a user to conduct searches within. Snowflake and similar “data warehouse as a service” platforms may thereby allow users and companies to offload complex and expensive data warehousing and query operations to a cloud provider.
  • a user seeking to query a multi-terabyte data warehouse may, rather than trying to execute the query and collect results on their laptop, send instructions to a virtual warehouse in the cloud that causes one or more servers to, via a virtual warehouse, perform the query on their behalf.
  • This allows the user to access the results of the data (e.g., in a user interface) from a relatively underpowered computing device.
  • systems like Snowflake have numerous benefits: they lower the processing burden on individual users’ computers when conducting queries, they lower the network bandwidth required for such queries (as, after all, data need not be downloaded to the user’s computer), and they (in many cases) speed up the overall query process significantly.
  • Snowflake architecture In addition to avoiding resource limitations associated with queries, another advantage of the Snowflake architecture is that it allows users to collect data in a way that is resilient. Because a user’ s laptop may be relatively underpowered, queries that request significant amounts of data may crash the laptop. Moreover, because a single device collects the results of a query, unexpected technical issues (e.g., power loss, Internet disconnects) may cause the entire query to fail.
  • the Snowflake architecture is equipped with built-in replication and failover/failback procedures which avoid such crashes, thereby ensuring that data continuity may be preserved.
  • Snowflake architecture improves conventional query execution is that Snowflake allows virtual warehouses to be created, modified, and destroyed as desired. This allows multiple queries to be executed simultaneously but separately.
  • the Snowflake architecture allows a first user from an organization to execute a first query in a first virtual warehouse at the same time that a second user from the same organization executes a second query in a second virtual warehouse.
  • the different virtual warehouses may be configured with different computing resources. Commonly, such virtual warehouses may be referred to as “extra-large,” “large,” “medium,” or “small” (that is, the virtual warehouses may be in “t-shirt sizes”), referring to a different relative quantity of computing resources available to a virtual warehouse.
  • a virtual warehouse for large, significant, and/or time-sensitive queries may be an “extra-large” virtual warehouse (and have a significant quantity of computing resources), whereas a virtual warehouse for relatively smaller, less significant, and/or more time-insensitive queries may be a “small” virtual warehouse (and have a relatively small quantity of computing resources).
  • the quantity of computing resources might be subdivided into different “nodes,” such that, for example, a “large” virtual warehouse may comprise eight nodes, whereas a “small” virtual warehouse may comprise two nodes.
  • a particular virtual warehouse may receive thousands of different queries, may collect and sort millions of different rows of data, and generally may be used by a wide variety of different users of an organization. That said, the nature of the queries (e.g., whether a query takes ten seconds or ten hours to complete) provided by users may change.
  • such queries may vary in so-called “t-shirt sizes” (e.g., extra-large, large, medium, small) based on the processing complexity of those queries (and, e.g., in turn, the processing time required to complete the query, which may itself be a function of computing resources available to complete the query).
  • the frequency of queries provided by users may vary: for example, a large quantity of smaller (e.g., easily processed) queries may be provided during the workday, whereas a smaller quantity of large (e.g., time-consuming) queries may be presented outside of work hours (and, e.g., as part of nighttime automated operations). As such, it is not uncommon for virtual warehouses to be improperly sized.
  • an organization may maintain an unnecessarily large virtual warehouse which processes and manages queries quickly but may cost that organization tens of thousands of extra dollars per month.
  • an organization may maintain an unnecessarily small virtual warehouse which is cheap but which takes excessively long amounts of time to process and manage queries.
  • aspects described herein relate to optimizing the sizing of virtual warehouses.
  • the process depicted herein balances the cost of virtual warehouses (which generally increases with the size of those virtual warehouses) with the efficiencies size provides in handling queries.
  • the aspects described herein optimize virtual warehouses for organizations.
  • a virtual warehouse can be tested with a variety of different configurations (e.g., at a variety of different sizes) such that an optimal configuration may be identified.
  • a testing database e.g., a zero-copy database
  • the emulated events may be real events emulated with respect to real data, such that realistic performance parameters may be collected, and those performance parameters may be used to select an optimized configuration for a virtual warehouse.
  • the testing process described herein leverages key functionalities of virtual warehouse as a service platforms (e.g., the Snowflake platform) in order to efficiently size virtual warehouses.
  • a computing device may be configured to log a plurality of different events associated with a data sharing platform. That data sharing platform, which may be the same as or similar to the Snowflake platform, may enable users to access one or more databases managed by the data sharing platform.
  • the data sharing platform may be configured to provide access to the data stored by the data sharing platform via one or more of a plurality of virtual warehouses.
  • Each of the plurality of virtual warehouses may comprise a respective set of computing resources configured to, e.g., execute one or more queries with respect to at least a portion of a plurality of data warehouses, collect results from the one or more queries, and provide access to the collected results.
  • the computing device may generate a testing database by duplicating at least one of the one or more databases.
  • the computing device may then predict an optimized virtual warehouse configuration for a first virtual warehouse by selecting a plurality of different warehouse configurations for the first virtual warehouse, measuring performance parameters of each of the plurality of different warehouse configurations by emulating, via the first virtual warehouse, the plurality of different events at the testing database, and selecting the optimized virtual warehouse configuration based on the performance parameters.
  • the plurality of different warehouse configurations may each correspond to a different set of computing resources available to the first virtual warehouse. Then, the computing device may output the optimized virtual warehouse configuration.
  • a computing device may be configured to generate a testing database that corresponds to the state of one or more databases managed by a data sharing platform at a point in time. For example, the computing device may generate a zero-copy clone of the one or more databases.
  • That data sharing platform which may be the same as or similar to the Snowflake platform, may enable users to access one or more databases managed by the data sharing platform.
  • the data sharing platform may be configured to provide access to the data stored by the data sharing platform via one or more of a plurality of virtual warehouses.
  • Each of the plurality of virtual warehouses may comprise a respective set of computing resources configured to, e.g., execute one or more queries with respect to at least a portion of a plurality of data warehouses, collect results from the one or more queries, and provide access to the collected results.
  • the computing device may determine a log of a plurality of different events executed, via one or more of the plurality of virtual warehouses and after the point in time, with respect to the one or more databases. For example, to log the plurality of different events, the computing device may log one or more queries executed with respect to the one or more databases The computing device may then predict an optimized virtual warehouse configuration for a first virtual warehouse.
  • the computing device may measure first performance parameters of a first warehouse configuration by replaying, via the first virtual warehouse, the plurality of different events at the testing database while the first virtual warehouse is configured in accordance with the first warehouse configuration.
  • the plurality of different warehouse configurations may each correspond to a different set of computing resources available to the first virtual warehouse.
  • the computing device may then modify the testing database by rolling the testing database back to the state of the one or more databases managed by the data sharing platform at the point in time.
  • the computing device may measure second performance parameters of a second warehouse configuration by replaying, via the first virtual warehouse, the plurality of different events at the testing database while the first virtual warehouse is configured in accordance with the second warehouse configuration.
  • the computing device may select the optimized virtual warehouse configuration based on the first performance parameters and the second performance parameters.
  • the computing device may then output the optimized virtual warehouse configuration.
  • FIG. 1 shows an example of a system in which one or more aspects described herein may be implemented.
  • FIG. 2 shows an example computing device in accordance with one or more aspects described herein.
  • FIG. 3 depicts computing devices, virtual warehouse servers, and data warehouses working in conjunction to execute queries.
  • FIG. 4 shows a flow chart which may be performed to determine an optimized virtual warehouse configuration.
  • FIG. 5 depicts an example deep neural network architecture for a model according to one or more aspects of the disclosure.
  • FIG. 6 depicts an example of performance parameters from which an optimized virtual warehouse configuration may be selected.
  • FIG. 7 shows a flow chart which may be performed to determine an optimized virtual warehouse configuration.
  • a virtual warehouse may comprise one or more computing devices which are configured to perform tasks associated with one or more queries, such as executing the one or more queries with respect to one or more data warehouses, collecting results from those one or more queries (e.g., from the one or more data warehouses), and/or providing those collected results to one or more user devices.
  • three virtual warehouses may be instantiated on a single computing device (e.g., a server), a plurality of computing devices (e.g., a distributed network of servers), or the like.
  • the availability of and/or use of a virtual warehouse may be associated with cost.
  • an organization may be charged based on a time in which a virtual warehouse is used, the size of a query, the amount of memory used by a query, or the like. Accordingly, virtual warehouses may be limited in their size (that is, the amount of computing resources available to them) to save money and to preserve computing resources. For example, for simple queries, a virtual warehouse may be instantiated with a relatively small quantity of computing resources (e.g., processor speed, memory) so as to lower the cost of maintaining and using that virtual warehouse. Moreover, multiple virtual warehouses may be available to an organization. For example, an organization may maintain a large virtual warehouse for significant and business-critical queries, whereas it may maintain a plurality of smaller virtual warehouses for more routine and less time-sensitive queries.
  • computing resources e.g., processor speed, memory
  • Methods, systems, apparatuses, and computer-readable media as described herein may relate to the analysis of how computing resources may be efficiently allocated to virtual warehouses in view of real-life events occurring with preexisting virtual warehouses.
  • a log of events associated with virtual warehouses e.g., queries provided to those virtual warehouses, sorting and/or viewing activities with respect to those virtual warehouses
  • a duplicate copy of one or more databases may be made.
  • the events may be emulated with respect to the duplicate copy of the one or more databases using a virtual warehouse, such that various configurations of that virtual warehouse may be tested.
  • performance parameters may be collected.
  • an optimized virtual warehouse configuration may be selected based on the performance parameters. In this manner, configurations for a virtual warehouse may be tested using real-world queries and real-world data, providing realistic performance parameters that help accurately estimate the impact of a particular virtual warehouse configuration.
  • the present disclosure is significantly different than conventional optimization processes at least in that it operates in view of the particularities and unique needs of virtual warehouses.
  • the present disclosure is far more than a mere instruction to decide the optimal size of a data warehouse: rather, the present disclosure processes and emulates a large quantity of real- world events against a duplicative database to generate test data that is, by virtue of the manner in which it is emulated, unique to the circumstance where such events are implemented through virtual warehouses.
  • This complexity is one reason why many users of virtual warehouse as a service platforms, such as the Snowflake platform, may inadvertently overspend for excessive computing resource allocations: because they do not have the tools for accurately testing the effectiveness of a particular virtual warehouse configuration, and because the particularities of virtual warehouse as a server platforms (e.g., the Snowflake platform) introduce complexities which effectively moot conventional database sizing considerations, they have no insight into the ramifications of changing such a configuration.
  • the present disclosure also improves the functioning of computers by improving the manner in which queries are executed with respect to one or more data warehouses.
  • Virtual warehouses provide an improvement to conventional query systems, but their misconfiguration and misuse can result in the waste of computing resources.
  • improvements to the manner in which queries are received by virtual warehouses may make those virtual warehouses more efficient. For example, by properly sizing a virtual warehouse based on testing various virtual warehouse configurations using real-world data and past events that have occurred with respect to that real-world data, the virtual warehouse can be configured in a manner which, e.g., does not unnecessarily waste computing resources and which does not cause queries to take an undesirably long time to execute.
  • the present disclosure is also fundamentally rooted in computing devices and, in particular, an environment with virtual warehouses.
  • virtual warehouse as a service platform architectures e.g., Snowflake’s architecture
  • a service platform architecture e.g., Snowflake’s architecture
  • compute environments e.g., different virtual warehouses
  • other database systems rely on monolithic systems to handle all enterprise needs. It is precisely this flexibility of these virtual warehouse as a service platforms that is addressed by the improvements discussed herein.
  • FIG. 1 shows a system 100.
  • the system 100 may include one or more computing devices 110, one or more data warehouses 120, and/or one or more virtual warehouse servers 130 in communication via a network 140.
  • network connections shown are illustrative and any means of establishing a communications link between the computers may be used.
  • the existence of any of various network protocols such as TCP/IP, Ethernet, FTP, HTTP and the like, and of various wireless communication technologies such as GSM, CDMA, WiFi, and LTE, is presumed, and the various computing devices described herein may be configured to communicate using any of these network protocols or technologies. Any of the devices and systems described herein may be implemented, in whole or in part, using one or more computing systems described with respect to FIG. 2.
  • the computing devices 110 may, for example, provide queries to the virtual warehouse servers 130 and/or receive query results from the virtual warehouse servers 130, as described herein.
  • the datawarehouses 120 may store data and provide, in response to queries, all or portions of the stored data, as described herein.
  • the data warehouses 120 may include, but are not limited to relational databases, hierarchical databases, distributed databases, in-memory databases, flat file databases, XML databases, NoSQL databases, graph databases, and/or a combination thereof.
  • the virtual warehouse servers 130 may execute, manage, resize, and otherwise control one or more virtual warehouses, as described herein.
  • one or more of the computing devices 110 may send a request to execute a query to one or more of the virtual warehouse servers 130, and one or more virtual warehouses of the virtual warehouse servers 130 may perform steps which effectuate that query with respect to one or more of the data warehouses 120.
  • the network 140 may include a local area network (LAN), a wide area network (WAN), a wireless telecommunications network, and/or any other communication network or combination thereof.
  • the virtual warehouse servers 130 and/or the data warehouses 120 may be all or portions of a cloud system.
  • the computing devices 110 may be located in a first location (e.g, the offices of a corporation), and the virtual warehouse servers 130 and/or the data warehouses 120 may be located in a variety of locations (e.g., distributed in a redundant manner across the globe). This may protect business resources: for example, if the Internet goes down in a first location, the distribution and redundancy of various devices may allow a business to continue operating despite the outage.
  • the virtual warehouse servers 130 may be all or portions of a virtual warehouse as a service system.
  • a service system is the Snowflake architecture.
  • any type of virtual warehouse as a service system may be implemented using the present disclosure.
  • the computing devices 110 and/or the data warehouses 120 may be managed by an organization.
  • the virtual warehouse servers 130 may be managed by a different entity, such as Snowflake Inc.
  • a third party e.g., Snowflake
  • a data warehouse such as any one of the data warehouses 120, may be one or more databases or other devices which store data.
  • a data warehouse may be a single database, a collection of databases, or the like.
  • a data warehouse may be structured and/or unstructured, such that, for example, a data warehouse may comprise a data lake.
  • a data warehouse may store data in a variety of formats and in a variety of manners.
  • a data warehouse may comprise textual data in a table, image data as stored in various file system folders, and the like.
  • the data transferred to and from various computing devices in a system 100 may include secure and sensitive data, such as confidential documents, customer personally identifiable information, and account data. Therefore, it may be desirable to protect transmissions of such data using secure network protocols and encryption, and/or to protect the integrity of the data when stored on the various computing devices.
  • a file-based integration scheme or a service-based integration scheme may be utilized for transmitting data between the various computing devices.
  • Data may be transmitted using various network communication protocols.
  • Secure data transmission protocols and/or encryption may be used in file transfers to protect the integrity of the data, for example, File Transfer Protocol (FTP), Secure File Transfer Protocol (SFTP), and/or Pretty Good Privacy (PGP) encryption.
  • FTP File Transfer Protocol
  • SFTP Secure File Transfer Protocol
  • PGP Pretty Good Privacy
  • one or more web services may be implemented within the various computing devices.
  • Web services may be accessed by authorized external devices and users to support input, extraction, and manipulation of data between the various computing devices in the system 100.
  • Web services built to support a personalized display system may be cross-domain and/or cross-platform, and may be built for enterprise use. Data may be transmitted using the Secure Sockets Layer (SSL) or Transport Layer Security (TLS) protocol to provide secure connections between the computing devices.
  • Web services may be implemented using the WS-Security standard, providing for secure SOAP messages using XML encryption.
  • Specialized hardware may be used to provide secure web services.
  • secure network appliances may include built-in features such as hardware- accelerated SSL and HTTPS, WS-Security, and/or firewalls. Such specialized hardware may be installed and configured in the system 100 in front of one or more computing devices such that any external devices may communicate directly with the specialized hardware.
  • the computing device 200 may be the same or similar as any one of the computing devices 110, the virtual warehouse servers 130, and/or the data warehouses 120 of FIG. 1.
  • the computing device 200 may include a processor 203 for controlling overall operation of the computing device 200 and its associated components, including RAM 205, ROM 207, input/output device 209, communication interface 211, and/or memory 215.
  • a data bus may interconnect processor(s) 203, RAM 205, ROM 207, memory 215, I/O device 209, and/or communication interface 211.
  • computing device 200 may represent, be incorporated in, and/or include various devices such as a desktop computer, a computer server, a mobile device, such as a laptop computer, a tablet computer, a smart phone, any other types of mobile computing devices, and the like, and/or any other type of data processing device.
  • Input/output (TO) device 209 may include a microphone, keypad, touch screen, and/or stylus through which a user of the computing device 200 may provide input, and may also include one or more of a speaker for providing audio output and a video display device for providing textual, audiovisual, and/or graphical output.
  • Software may be stored within memory 215 to provide instructions to processor 203 allowing computing device 200 to perform various actions.
  • memory 215 may store software used by the computing device 200, such as an operating system 217, application programs 219, and/or an associated internal database 221.
  • the various hardware memory units in memory 215 may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Memory 215 may include one or more physical persistent memory devices and/or one or more non-persistent memory devices.
  • Memory 215 may include, but is not limited to, random access memory (RAM) 205, read only memory (ROM) 207, electronically erasable programmable read only memory (EEPROM), flash memory or other memory technology, optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store the desired information and that may be accessed by processor 203.
  • Communication interface 211 may include one or more transceivers, digital signal processors, and/or additional circuitry and software for communicating via any network, wired or wireless, using any protocol as described herein.
  • Processor 203 may include a single central processing unit (CPU), which may be a single-core or multi-core processor, or may include multiple CPUs. Processor(s) 203 and associated components may allow the computing device 200 to execute a series of computer- readable instructions to perform some or all of the processes described herein. Although not shown in FIG.
  • various elements within memory 215 or other components in computing device 200 may include one or more caches, for example, CPU caches used by the processor 203, page caches used by the operating system 217, disk caches of a hard drive, and/or database caches used to cache content from database 221.
  • the CPU cache may be used by one or more processors 203 to reduce memory latency and access time.
  • a processor 203 may retrieve data from or write data to the CPU cache rather than reading/writing to memory 215, which may improve the speed of these operations.
  • a database cache may be created in which certain data from a database 221 is cached in a separate smaller database in a memory separate from the database, such as in RAM 205 or on a separate computing device.
  • a database cache on an application server may reduce data retrieval and data manipulation time by not needing to communicate over a network with a back-end database server.
  • caches and others may be included in various embodiments, and may provide potential advantages in certain implementations of devices, systems, and methods described herein, such as faster response times and less dependence on network conditions when transmitting and receiving data.
  • FIG. 3 shows a system comprising the computing devices 110, the virtual warehouse servers 130, and the data warehouses 120 of FIG. 1.
  • FIG. 3 may depict all or portions of a system configured according to the Snowflake architecture or a similar architecture permitting use of one or more virtual warehouses.
  • FIG. 3 also depicts various elements which may be portions of those computing devices, as well as transmissions between those devices.
  • the computing devices 110 are shown having a request application 301
  • the virtual warehouse servers 130 are shown having a virtual warehouse manager application 302 and three virtual warehouses (a virtual warehouse A 303a, a virtual warehouse B 303b, and a virtual warehouse C 303c)
  • the data warehouses 120 are shown comprising a data warehouse A 304a and a data warehouse B 304b.
  • the computing devices 110 may be users’ personal computing devices
  • the virtual warehouse servers 130 may be cloud servers managed by a virtual warehouse as a service platform organization, such as Snowflake Inc., of San Mateo, CA.
  • the request application 301 may transmit, to the virtual warehouse manager application 302, a request for a query.
  • the transmitted request may be in a variety of formats which indicate a request for a query to be executed.
  • the request may comprise a structured query which may be directly executed on one or more of the data warehouses 120 (such as an SQL query), and/or may comprise a vaguer request for data (e.g., a natural language query, such as a request for “all data in the last month”).
  • the request application 301 may be any type of application which may transmit a request to the virtual warehouse manager application 302, such as a web browser (e.g., showing a web page associated with the virtual warehouse manager application 302), a special-purpose query application (e.g., as part of a secure banking application, such as may execute on a tablet or smartphone), an e-mail application (e.g., such that the request to the virtual warehouse manager application 302 may be transmitted via e-mail), or the like.
  • the request may be input by a user in a user interface of the request application 301 and using, for example, a keyboard, a mouse, voice commands, a touchscreen, or the like.
  • the virtual warehouse manager application 302 may select one of a plurality of available virtual warehouses (in this case, the virtual warehouse C 303C) to execute the query. As part of this process, the virtual warehouse manager application may determine which of a plurality of virtual warehouses should address the request received in step 305. The virtual warehouse manager application 302 may identify an execution plan for the query by determining one or more sub-queries to be executed with respect to one or more of the data warehouses 120. For example, the request may comprise querying both the data warehouse A 304a and the data warehouse B 304b for different portions of data. The virtual warehouse manager application 302 may, based on the query and the execution plan, predict a processing complexity of the query.
  • the processing complexity of the query may correspond to a time to complete the query (e.g., the time required to perform all steps of the execution plan), a quantity of computing resources (e.g., processor time, memory) required to execute the query, or the like.
  • the virtual warehouse manager application 302 may additionally and/or alternatively determine an operating status of the plurality of virtual warehouses and/or processing capabilities of the plurality of virtual warehouses.
  • the virtual warehouse A 303a is shown as being large (e.g., having relatively significant processing capabilities) but having a utilization of 99% (that is, being quite busy), the virtual warehouse B 303b is shown as being large and having a utilization of 5% (that is, being quite free), and the virtual warehouse C 303c is shown as being small and having a utilization of 5%.
  • a subset of the plurality of virtual warehouses may be selected.
  • that subset may comprise both the virtual warehouse B 303b and the virtual warehouse C 303c, at least because both have a low utilization rate and thus may be capable of handling the request received from the request application 301.
  • one or more virtual warehouses may be selected to execute the query.
  • the virtual warehouse C 303c has been selected to address the query. This may be because, for example, the query may be small (that is, the execution plan may be simple or otherwise quick to handle), such that executing the query on the virtual warehouse C 303c may be cheaper and may free up the virtual warehouse B 303b for handling larger, more complex queries.
  • Virtual warehouses such as the virtual warehouse A 303 a, the virtual warehouse B 303b, and/or the virtual warehouse C 303c, may comprise a respective set of computing resources.
  • each virtual warehouse may execute on one or a plurality of servers (e.g., the virtual warehouse servers 130), and each virtual warehouse may be apportioned a particular quantity of computing resources (e.g., computing processor speed, memory, storage space, bandwidth, or the like).
  • such quantities of computing resources may be referred to via “t-shirt sizes,” such that one virtual warehouse may be referred to as “large,” whereas another may be referred to as “small.”
  • Virtual warehouses may be resized such that, for example, the virtual warehouse A 303a (which is large) may be shrunk down to a smaller size to save money and/or to allocate resources to another virtual warehouse.
  • Virtual warehouses may also have different utilization rates. For example, a virtual warehouse using substantially all of its resources to execute a query may be said to be fully occupied (that is, to have a utilization rate of approximately 100%), whereas a virtual warehouse not performing any tasks may be said to be free (that is, to have a utilization rate of approximately 0%).
  • the size of the virtual warehouses may affect the utilization rate: for example, a larger virtual warehouse may be capable of handling more queries at the same time as compared to a relatively smaller virtual warehouse.
  • virtual warehouses may be configured to execute one or more queries with respect to at least a portion of the data warehouses 120, collect results from the one or more queries, and provide, to one or more computing devices, access to the collected results.
  • the size and/or utilization of a particular virtual warehouse may impact its ability to execute queries, collect results, and provide those results.
  • Virtual warehouses such as the virtual warehouse A 303 a, the virtual warehouse B 303b, and/or the virtual warehouse C 303c, may be resized based on a schedule.
  • a single virtual warehouse e.g., the virtual warehouse A 303a
  • a schedule may be defined by an administrator, may be based on a use pattern specific to the virtual warehouse, and/or may be based on a pattern of activity, by one or more users, corresponding to one or more different virtual warehouses.
  • the virtual warehouse manager application 302 may monitor use of virtual warehouses and determine that, during business hours, the virtual warehouses are used more frequently. Based on such a determination, the virtual warehouse manager application 302 may configure one or more virtual warehouses with a schedule that causes those one or more virtual warehouses to be larger during business hours and smaller during non-business hours. This may advantageously save money for an organization: by dynamically scaling the size of virtual warehouses, needlessly large (and thereby needlessly expensive) virtual warehouses need not be maintained.
  • the virtual warehouse manager application 302 is shown as part of the virtual warehouse servers 130, the virtual warehouse manager application 302 may execute on a wide variety of computing devices.
  • the virtual warehouse manager application may execute on one or more of the computing devices 110, such as the same computing device 110 hosting the request application 301.
  • the virtual warehouse manager application may execute on an entirely separate computing device. Because the virtual warehouse manager application 302 may perform steps above and beyond conventional virtual warehouse functionality, the application may execute on an entirely separate computing device and may interface with preexisting virtual warehouse systems, e.g., Snowflake.
  • the selected virtual warehouse may execute the query requested by the request application 301. As shown in FIG. 3, this entails querying both the data warehouse A 304a and the data warehouse B 304b.
  • the data warehouses 120 such as the data warehouse A 304a and the data warehouse B 304b, need not be the same: for example, the data warehouse A 304a may have an entirely different format, may have entirely different schedules which affect their size at any given time, and may have an entirely different structure as compared to the data warehouse B 304b.
  • the data warehouse A 304a may comprise a SQL database
  • the data warehouse B 304b may comprise a fde server which stores files according to the File Allocation Table (FAT) file system.
  • the virtual warehouse C 303c may receive, store, and/or organize results from the data warehouses 120.
  • the virtual warehouse C 303c may receive query results from the data warehouse A 304a and the data warehouse B 304b, may store those results in memory, and then may encrypt those results for security purposes.
  • the virtual warehouse C 303c provides the collected results to the virtual warehouse manager application 302.
  • the virtual warehouse manager application 302 provides the results to one or more of the computing devices 110.
  • This process is optional, as the virtual warehouse C 303c may, in some instances, provide the results directly to one or more of the computing devices 110.
  • the results need not be provided back to the request application 301: for example, the results may be provided to an entirely different computing device (e.g., such that the request may have been received from a smartphone but the results may be delivered to an associated laptop) and/or may be provided to an entirely different application (e.g., such that the request may have been received via the request application 301, but the results may be received by a separate application, such as a spreadsheet application, executing on one or more of the computing devices 110).
  • a separate application such as a spreadsheet application
  • step 305a is reflected as an arrow directly leading from the request application 301 to one or more of the virtual warehouse servers 130, the request may in fact be routed through various other computing devices as part of the network 140.
  • the query process reflected in step 305c and step 305d may involve a plurality of different transmissions between the virtual warehouse C 303 c and the data warehouses 120.
  • FIG. 4 depicts a flowchart with steps which may be performed by a computing device, such as one or more of the computing devices 110, the virtual warehouse servers 130, and/or the data warehouses 120.
  • a computing device such as one or more of the computing devices 110, the virtual warehouse servers 130, and/or the data warehouses 120.
  • One or more non -transitory computer-readable media may store instructions that, when executed by one or more processors of a computing device, cause performance of one or more of the steps of FIG. 4.
  • the steps depicted in FIG. 4 may operate on a Snowflake environment or other virtual warehouse environment, such that they may be performed by a computing device within or external to such an environment.
  • the steps depicted in FIG. 4 may be performed on a user device external to a preexisting virtual warehouse environment.
  • the computing device may log events.
  • the computing device may log a plurality of different events associated with a data sharing platform, such as Snowflake.
  • An event may comprise any activity associated with one or more databases, virtual warehouses, or the like.
  • One example of an event may be a write action to a database of one or more databases.
  • Another example of an event may be a read action to a database of one or more databases.
  • the logged events may represent one or more actions that have been taken with respect to one or more virtual warehouses and/or one or more databases, such as queries, requests to view content, or the like.
  • step 401 may comprise the computing device receiving a preexisting log of events.
  • a log may be maintained of one or more events that have occurred with respect to one or more virtual warehouses and/or one or more databases.
  • the computing device may receive (e.g., retrieve) this log, rather than generating it.
  • Events may be associated with different sizes. For example, one query may be considered large in that it requires a relatively large quantity of computing resources to complete, whereas another query may be considered small in that it requires a relatively smaller quantity of computing resources to complete.
  • the size of an event may implicate how long it takes for a virtual warehouse to complete the event. For example, a large sized query may be executed quickly on a large virtual warehouse, whereas a small virtual warehouse may take quite a long time to complete the large sized query.
  • a user need not specify the size of a query; rather, the size of the query itself may be determined based on an execution plan associated with the query. For example, as indicated above, an execution time may be predicted based on an execution plan corresponding to a query. As such, the particular size of a query may correlate to the predicted execution time of a query: for example, a query requiring less than one second may be considered small, whereas any query over one minute may be considered large.
  • a plurality of events may comprise events of a variety of different sizes.
  • the log determined in step 401 may comprise a first event that satisfies a processing time threshold (e.g., and is considered “large”), whereas the log may also comprise a second event that does not satisfy the processing time threshold (and, e.g., is not considered “large”).
  • the plurality of events may comprise a wide variety of different types of events, thus better representing the real variety of events which may occur with respect to a real -world virtual warehouse. This is but one example of an advantage of the present disclosure over other systems: by using real-world events, a wide variety of such events can be collected (and, later, emulated), thus providing significantly more robust and accurate simulation data.
  • queries may be added to a group of small queries if the queries took less than a minute to execute on a virtual warehouse. Queries may be added to a group of medium queries if the queries took between one and five minutes to execute on a virtual warehouse. Queries may be added to a group of large queries if the queries took over five minutes to execute.
  • the log may correspond to a particular time period.
  • a log of a plurality of events may correspond to workdays, the workweek, or the like.
  • the log may be segregated and/or otherwise arranged based on time, such that, e.g., events may be categorized based on time of day (e.g., during the workday, outside of working hours), based on day of the week (e.g, weekdays versus weekends), or the like.
  • Such information may be advantageously used to emulate different times, such that, e.g., the computing device may emulate both workday conditions and weekend conditions.
  • Such information may be advantageous to determine whether, for example, a virtual warehouse should be resized during certain hours (e.g., the evening).
  • Query history functionalities can store information about previously-executed queries.
  • the Query History functionality of the Snowflake platform stores detailed information about any query that has been executed in a Snowflake system over a time period (e.g., one year).
  • Each query is assigned a unique query identifier.
  • each query can be uniquely identified and, as will be described in more detail below, emulated.
  • the computing device may generate a testing database.
  • the computing device may generate a testing database by duplicating at least one of the one or more databases. This process may comprise generating a zero-copy form of a database at a particular time.
  • the testing database may be a copy of a database at a particular time corresponding to immediately before the events in the log of step 401 began. In this manner, the events from the log of step 401 may be re-run on the testing database to produce the current version of a database.
  • the computing device may generate one or more testing databases in a manner which copies the state of one or more databases at a particular point in time.
  • the computing device may generate a testing database in a manner which causes the testing database to represent the state of a particular database at a particular point in time before the events logged in step 401.
  • the computing device may generate a testing database that reflects the state of a database before the events logged in step 401 were logged, such that one or more of those events may be replayed with respect to the database.
  • a time travel functionality such as the Snowflake time travel functionality, which enables accessing of historical data at any point within a defined period, may be used.
  • the testing database need not always reflect the database at a point in time before the events logged in step 401.
  • the testing database may reflect a latest form of the database, such that older queries are executed against the current form of the database.
  • changes to the database e.g., a recent significant growth in size and/or complexity
  • the events may be filtered and/or otherwise adjusted such that they may be performed against the testing database without causing inconsistencies to emerge.
  • an event logged in step 401 indicates that a user changed a particular cell (e.g., cell 03215) to a particular value (e.g., “Red”)
  • the testing database may already reflect that change.
  • the computing device may modify the event (e.g., to instead change the cell to a different value, such as “Black” or “Blue”), the computing device may remove the event from the log, and/or the computing device may allow the event to remain in the log (e.g., such that the value “Red” is changed to “Red,” though doing so would ordinarily be a waste of computing resources).
  • testing database may be generated by copying all or portions of an existing portion of a virtual warehouse environment.
  • the testing database may be a copy of an existing database in a virtual warehouse environment.
  • the texting database may be generated using a zero-copy functionality, such as the zero-copy functionality of the Snowflake platform.
  • Snowflake’s zero-copy cloning feature may take a snapshot of any table, schema, or database and creates a derived copy of that object which initially shares the underlying storage. This functionality may be useful for creating instant backups that might not incur any particular storage costs in a virtual warehouse architecture. In other words, this process may allow for the creation of a backup that does not require that data be copied from one storage device or another.
  • the clone might not use data storage because it shares all of the existing micro-partitions of the original table at the time it was closed. That said, as rows are added/deleted/updated in the clone (and independently from the original table), new micropartitions may be created that are associated with the clone (and not the original database). In this manner, the original data may be preserved even when the testing database is modified during testing.
  • the testing database may be a copy of the database with which the events from the log in step 401 were performed. For example, if the events in the log received in step 401 are with respect to a particular database, then the testing database may be a copy of that particular database. In this manner, the same events may be emulated against the same data.
  • the computing device may select a plurality of different virtual warehouse configurations.
  • the computing device may select a plurality of different virtual warehouse configurations for the first virtual warehouse.
  • Each of the plurality of different warehouse configurations may correspond to a different set of computing resources available to the first virtual warehouse.
  • one virtual warehouse configuration may have a different quantity of memory as compared to another configuration.
  • one virtual warehouse configuration may have a different processor speed than another configuration. As indicated above, such configurations may be referred to in terms of t-shirt sizes.
  • one virtual warehouse configuration may be referred to as “large” if it has at least a first amount of memory, whereas a virtual warehouse may be referred to as “small” if it has less than or equal to a second amount of memory.
  • each respective size may be double the number of computing resources (e.g., nodes) as a previous size.
  • a “large” size may be double the size of a “medium” size.
  • a virtual warehouse may be multi-cluster in that they may use multiple sets of computing resources. Additionally and/or alternatively, a virtual warehouse may be multi-node in that it may entail use of multiple nodes. As such, for example, one “extra small” virtual warehouse may comprise a single node, whereas a “large” virtual warehouse may comprise eight nodes. As such, one virtual warehouse configuration may indicate that a virtual warehouse is to comprise only one node and/or one cluster, whereas another virtual warehouse configuration may indicate that a virtual warehouse can comprise up to a certain number of nodes and/or clusters.
  • Different virtual warehouse configurations may correspond to different scaling policies. Different virtual warehouses may scale to different sizes (e.g., quantity of computing resources, quantity of clusters, quantity of nodes) at different times. For example, one virtual warehouse configuration may cause a virtual warehouse to scale down its number of computing resources at night, whereas another virtual warehouse configuration may cause a virtual warehouse to dynamically add or remove clusters and/or nodes based on its workload.
  • sizes e.g., quantity of computing resources, quantity of clusters, quantity of nodes
  • Different virtual warehouse configurations may correspond to different auto suspend policies.
  • Virtual warehouses need not be available at all times, and some virtual warehouses may be disabled during certain periods of time and/or based on utilization.
  • a virtual warehouse configuration may cause a virtual warehouse to automatically suspend itself (and thereby preserve computing resources and money) when it has not been used for over an hour.
  • a virtual warehouse configuration may cause a virtual warehouse to automatically suspend itself at night.
  • Different virtual warehouse configurations may correspond to different durations of time. For example, certain virtual warehouses may be instantiated for different periods of time (e.g., one hour, one day, one week). For example, one virtual warehouse configuration may indicate that a virtual warehouse should be available for only an hour, whereas another virtual warehouse configuration may indicate that a virtual warehouse should be available for only a week.
  • the computing device may measure performance parameters for each of the plurality of different warehouse configurations by, e.g., emulating the logged events from step 401. For example, the computing device may measure performance parameters of each of the plurality of different warehouse configurations by emulating, via the first virtual warehouse, the plurality of different events at the testing database. Such performance parameters may comprise, for example, the execution time of a particular event (e.g., as recorded by the EXECUTION TIME field of the Snowflake architecture). An example of such performance parameters is provided as FIG. 6, which is discussed in further detail below.
  • the computing device may first select, from the plurality of different warehouse configurations, a particular virtual warehouse configuration. The computing device may then configure (e.g, instantiate) a first virtual warehouse based on the particular virtual warehouse configuration. Then, the computing device may cause the plurality of different events to be executed with respect to the first virtual warehouse.
  • the computing device need not provide all of the plurality of different events to the first virtual warehouse at once: after all, doing so may easily overwhelm even the largest virtual warehouse, particularly if the plurality of different events is particularly voluminous.
  • the computing device may, e.g., emulate real activity by causing the plurality of different events to be executed with respect to the first virtual warehouse across a time period. For example, the computing device may insert random delays between the initiation of each of the plurality of different events, thereby emulating the potentially random nature of the plurality of different events. After all, it may be largely unpredictable as to when certain events may initiate in real life.
  • emulation of the plurality of different events may comprise causing each of the plurality of different events to be initiated at a different time. For example, a first event may be started at 10:01, whereas a second event may be started at 10:04. As some events may be scheduled at particular times, the emulation of the plurality of different events may take into account that certain events may be initiated at those particular times. For example, if a particular event is initiated every two minutes, then the particular event may be emulated such that it repeats every two minutes. As another example, if a particular event only occurs at a particular wall clock time (e.g., midnight), then the particular event may be emulated along with other events that were logged at or around midnight.
  • a particular wall clock time e.g., midnight
  • the performance parameters may indicate, for example, that a large virtual warehouse took ten seconds on average to complete large queries, five seconds on average to complete medium queries, and one second on average to complete small queries.
  • Those performance parameters may also indicate, for example, that a small virtual warehouse took twenty minutes on average to complete large queries, one minute on average to complete medium queries, and thirty seconds on average to complete small queries.
  • performance parameters may be collected for each query emulated as part of step 404.
  • performance parameters may be correlated with unique query identifiers (e.g., the query identifiers assigned to queries in the Snowflake Query History functionality).
  • the size of a virtual warehouse may change the speed with which a particular event is completed by the virtual warehouse.
  • a size of an event (that is, the processing time required for a particular event) may change.
  • a small virtual warehouse may take fifteen minutes to perform a particular query, meaning that the query may in that circumstance belong to a group of large queries. That said, a large virtual warehouse may take less than a minute to perform the same query, meaning that the same query may in that circumstance belong to a group of small queries. This shift from a large query to a small query may be a useful indicator of the value of increasing the size of a virtual warehouse form small to large.
  • the computing device may select an optimized virtual warehouse configuration. This selection may be based on the performance parameters measured in step 404. The computing device need not select the virtual warehouse that performs the most quickly or efficiently. Rather, the selection of the optimized virtual warehouse configuration may be based on a balance between the increasing cost of the size of a virtual warehouse as compared to the incremental value such size provides in processing speed.
  • the selection of the optimized virtual warehouse configuration may be based on the intended use of the virtual warehouse.
  • events e.g., queries
  • the speed of the queries may be worth paying significant sums.
  • paying for such a virtual warehouse may be worthwhile to ensure that queries are speedily processed.
  • time is of a concern
  • a small virtual warehouse is approximately equally speedy, then using such a small virtual warehouse may be tolerable.
  • the performance parameters may indicate that a large virtual warehouse takes one minute on average to process large queries, whereas a small virtual warehouse takes fifteen minutes on average to process the same queries.
  • the cost difference between the large virtual warehouse and the small virtual warehouse may be, for example, fifty thousand dollars a month.
  • it may be optimal for the virtual warehouse to be small if, for example, the virtual warehouse is to be used occasionally and/or by a department of an organization that does not require their queries to be answered particularly quickly. In other words, the incremental value of the fourteen minutes saved might not be worth the fifty thousand dollar a month cost.
  • step 403 through 405 may entail use of a machine learning model.
  • An example of a neural network architecture via which such machine learning models may be implemented is provided as FIG. 5, which is discussed below.
  • Machine learning models may be trained to aid in the selection of an optimized virtual warehouse configuration such that, for example, wider varieties of data may be taken into account when recommending a particular virtual warehouse configuration.
  • the computing device may train, using training data, a machine learning model to output a recommended virtual warehouse configuration.
  • That training data may comprise a history of different virtual warehouse configurations and a history of different event processing times.
  • the training data need not be specific to certain data, but may reflect a broad set of event processing times across a variety of different virtual warehouse configurations and/or a variety of different databases.
  • the computing device may then provide, to the trained machine learning model, input comprising the performance parameters determined in step 404.
  • the computing device may then receive, from the trained machine learning model, output indicating the optimized virtual warehouse configuration.
  • the machine learning model may recommend an optimized virtual warehouse configuration based on both historical processing times (as reflected in the training data) and the performance parameters (e.g, that are specific to a particular set of data).
  • the computing device may output the optimized virtual warehouse configuration
  • the output may be in a user interface.
  • the output may comprise a user interface element which prompts a user to accept the optimized virtual warehouse configuration.
  • upon user acceptance by, e.g., selecting a button, one or more existing virtual warehouses may be automatically resized based on the optimized virtual warehouse configuration.
  • FIG. 5 depicts an example deep neural network architecture 500.
  • the architecture depicted in FIG. 5 need not be performed on a single computing device, and may be performed by, e.g., a plurality of computers (e.g., any one of the devices depicted in FIG. 1).
  • An artificial neural network may be a collection of connected nodes, with the nodes and connections each having assigned weights used to generate predictions. Each node in the artificial neural network may receive input and generate an output signal. The output of a node in the artificial neural network may be a function of its inputs and the weights associated with the edges.
  • the trained model may be provided with input beyond the training set and used to generate predictions regarding the likely results.
  • Artificial neural networks may have many applications, including object classification, image recognition, speech recognition, natural language processing, text recognition, regression analysis, behavior modeling, and others.
  • the weights of each connection and/or node may be adjusted in a learning process as the model adapts to generate more accurate predictions on a training set.
  • the weights assigned to each connection and/or node may be referred to as the model parameters.
  • the model may be initialized with a random or white noise set of initial model parameters.
  • the model parameters may then be iteratively adjusted using, for example, stochastic gradient descent algorithms that seek to minimize errors in the model.
  • FIG. 6 depicts an example of performance parameters 600 via which an optimized virtual warehouse configuration may be selected.
  • the performance parameters 600 depicted in FIG. 6 may be, for example, the same or similar as the performance parameters discussed with respect to step 404 of FIG. 4.
  • the performance parameters 600 are depicted as a table with four columns: a virtual warehouse configuration column 601a, an average large event speed column 601b, an average medium event speed column 601c, an average small event speed 60 Id, and a cost column 60. These columns are illustrative, and represent one way in which performance parameters might be collected. For example, in some instances, each separate event might be measured, such that averages (such as those depicted in step FIG. 6) need not be collected.
  • the first data row 602a indicates that a large virtual warehouse configuration costing $30,000 a month completed large events after an average of seven seconds, medium events after an average of two seconds, and small events in less than one second.
  • the second data row 602b indicates that a medium virtual warehouse configuration costing $5,000 a month completed large events after an average of fifty-eight seconds, medium events after an average of ten seconds, and small events after an average of five seconds.
  • the third data row 602c indicates that a small virtual warehouse configuration costing $1,000 a month completed large events after an average of five minutes, medium events after an average of one minute, and small events after an average of fifty seconds.
  • the performance parameters 600 might be hundreds of different rows which reflect a variety of different possible combinations of virtual warehouse configurations e.g., virtual warehouse size, number of clusters and/or nodes in use at any given time, scaling policies, auto suspend policies, duration(s) of time with which the virtual warehouse is available, etc.).
  • an optimized virtual warehouse configuration may be determined by replaying events on a past version of a database.
  • a snapshot of one or more databases may be captured, and events logged after that snapshot may be replayed in a variety of different configurations.
  • Such a testing approach may be taken where, for example, the size and/or complexity of the database remains substantially the same over time, such that testing of an older form of a database might indicate appropriate configurations for virtual warehouses implementing queries on future forms of that same database.
  • such an approach may allow for testing to be more accurate in certain circumstances: because the events logged are replayed against the same database in the same state (and can be replayed in what amounts to a time-shifted manner), realistic performance parameters may be measured.
  • FIG. 7 depicts a flowchart with steps which may be performed by a computing device, such as one or more of the computing devices 110, the virtual warehouse servers 130, and/or the data warehouses 120.
  • a computing device such as one or more of the computing devices 110, the virtual warehouse servers 130, and/or the data warehouses 120.
  • One or more non -transitory computer-readable media may store instructions that, when executed by one or more processors of a computing device, cause performance of one or more of the steps of FIG. 7.
  • the steps depicted in FIG. 7 may operate on a Snowflake environment or other virtual warehouse environment, such that they may be performed by a computing device within or external to such an environment.
  • the steps depicted in FIG. 7 may be performed on a user device external to a preexisting virtual warehouse environment.
  • the computing device may snapshot an initial state of one or more databases.
  • This initial state may comprise an initial state of the data stored of the one or more databases at a particular point in time, such as a point in time before one or more events are logged.
  • This snapshot may be taken by performing a copy (e.g., a zero copy) of the one or more databases, using the time travel functionality of Snowflake, or similar methods.
  • the computing device may generate the testing database by generating a zero-copy clone of the one or more databases.
  • the computing device may store a particular version of one or more databases at a particular point in time.
  • the computing device may generate, at a first time, a testing database that corresponds to the state of one or more databases managed by a data sharing platform at a point in time.
  • step 701 need not be performed before events are logged. For example, in some instances, various database states may be preserved over time, and various events may be logged over time. As such, it may be possible to simply select a particular state of a database at a particular point in time, then to retrieve all events performed with respect to that database after the particular point in time.
  • the computing device may determine a log of one or more events.
  • the computing device may log the one or more events by logging one or more queries executed with respect to the one or more databases.
  • This step may be the same or similar as step 401 of FIG. 4.
  • the computing device may determine a log of a plurality of different events executed, via one or more of the plurality of virtual warehouses and after a point in time (e.g., that corresponding to a snapshot of a database), with respect to the one or more databases.
  • the logging in step 702 may be performed on a continual basis, such that it need not be performed after snapshot(s) of databases are captured, and/or need not end at any particular time.
  • this logging may comprise use of the Snowflake Query History functionality.
  • the events may be filtered and/or otherwise organized such that the logged events in step 702 comprise events that occurred after a time associated with the initial state of the databases snapshot in step 701.
  • step 704 the computing device may select a plurality of different warehouse configurations.
  • two configurations (configuration A 705a and configuration B 705b) have been selected, and are depicted as branching paths. With that said, any number of configurations may be selected and tested. Indeed, hundreds of different possible configurations might be selected and tested.
  • This step may be the same or similar as step 403 of FIG. 4.
  • FIG. 7 depicts the configuration A 705 and configuration B 705b as branching paths, this need not suggest that the steps in these branching paths be performed in parallel. Indeed, depending on various considerations (e.g., computing resource availability, speed requirements), the steps in each branching path might be performed sequentially, partially in parallel, entirely in parallel, or some combination thereof.
  • step 706a the computing device may configure the virtual warehouses with the configuration A 705a.
  • step 706b the computing device may configure the virtual warehouses with the configuration B 705b.
  • step 706a may comprise configuring one or more virtual warehouses to use five nodes and/or clusters.
  • step 706b may comprise configuring one or more virtual warehouses to use large-sized nodes and/or clusters.
  • step 707a the computing device may measure performance parameters for the configuration A 705a by running the events, logged in step 702, against the testing database generated in step 703. For example, the computing device may measure performance parameters of each of a plurality of different warehouse configurations by replaying, via the first virtual warehouse, the plurality of different events at the testing database. Similarly, in step 707b, the computing device may measure performance parameters for the configuration B 705b by running the events, logged in step 702, against the testing database generated in step 703. These steps may be the same or similar as step 404 of FIG. 4.
  • the computing device may modify the testing database by rolling the testing database back to the state of the database to which it corresponds. For example, after performing step 707a, the computing device may roll back the testing database to its status prior to when step 707a was performed. In turn, this may allow step 707b to be performed against the same testing database.
  • step 708 the computing device may select an optimized virtual warehouse configuration. This step may be the same or similar as step 405 of FIG. 4.
  • step 709 the computing device may output the optimized virtual warehouse configuration. This step may be the same or similar as step 406 of FIG. 4.
  • One or more aspects discussed herein may be embodied in computer-usable or readable data and/or computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices as described herein.
  • program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device.
  • the modules may be written in a source code programming language that is subsequently compiled for execution, or may be written in a scripting language such as (but not limited to) HTML or XML.
  • the computer executable instructions may be stored on a computer readable medium such as a hard disk, optical disk, removable storage media, solid- state memory, RAM, and the like.
  • the functionality of the program modules may be combined or distributed as desired in various embodiments.
  • the functionality may be embodied in whole or in part in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA), and the like.
  • Particular data structures may be used to more effectively implement one or more aspects discussed herein, and such data structures are contemplated within the scope of computer executable instructions and computer-usable data described herein.
  • Various aspects discussed herein may be embodied as a method, a computing device, a system, and/or a computer program product.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Insulators (AREA)
  • Insulating Bodies (AREA)
EP23706193.2A 2022-02-01 2023-01-24 Optimierung der zuweisung von computerressourcen für virtuelles lagerhaus Pending EP4473418A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/590,358 US20230244687A1 (en) 2022-02-01 2022-02-01 Optimization of Virtual Warehouse Computing Resource Allocation
US17/590,372 US12236278B2 (en) 2022-02-01 2022-02-01 Optimization of virtual warehouse computing resource allocation
PCT/US2023/011421 WO2023150039A1 (en) 2022-02-01 2023-01-24 Optimization of virtual warehouse computing resource allocation

Publications (1)

Publication Number Publication Date
EP4473418A1 true EP4473418A1 (de) 2024-12-11

Family

ID=85284008

Family Applications (1)

Application Number Title Priority Date Filing Date
EP23706193.2A Pending EP4473418A1 (de) 2022-02-01 2023-01-24 Optimierung der zuweisung von computerressourcen für virtuelles lagerhaus

Country Status (3)

Country Link
EP (1) EP4473418A1 (de)
CA (1) CA3243742A1 (de)
WO (1) WO2023150039A1 (de)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8418150B2 (en) * 2009-04-03 2013-04-09 Oracle International Corporation Estimating impact of configuration changes
CN109075994B (zh) * 2016-04-28 2022-04-05 斯诺弗雷克公司 多集群仓库
CN111949631B (zh) * 2019-05-14 2024-06-25 华为技术有限公司 一种确定数据库的配置参数的方法及装置

Also Published As

Publication number Publication date
WO2023150039A1 (en) 2023-08-10
CA3243742A1 (en) 2023-08-10

Similar Documents

Publication Publication Date Title
US20230244687A1 (en) Optimization of Virtual Warehouse Computing Resource Allocation
US9367601B2 (en) Cost-based optimization of configuration parameters and cluster sizing for hadoop
US8949558B2 (en) Cost-aware replication of intermediate data in dataflows
US11914595B2 (en) Virtual warehouse query monitoring and reporting
US12361026B2 (en) Query alerts generation for virtual warehouse
US12386840B2 (en) Dynamic query allocation to virtual warehouses
CN112685499A (zh) 一种工作业务流的流程数据同步方法、装置及设备
US20260037900A1 (en) Virtual Warehouse Analysis And Configuration Planning System
US20250307248A1 (en) Data Certification Process for Updates to Data in Cloud Database Platform
US11392715B1 (en) Data certification process for cloud database platform
US12236278B2 (en) Optimization of virtual warehouse computing resource allocation
EP4473418A1 (de) Optimierung der zuweisung von computerressourcen für virtuelles lagerhaus
US11392587B1 (en) Rule generation and data certification onboarding process for cloud database platform
Cano Optimizing distributed systems using machine learning
CA3225840C (en) Virtual warehouse query monitoring, dynamic query allocation, and query alerts generation
US12547501B2 (en) Virtual warehouse configuration rollback based on query monitoring
CA3225840A1 (en) Virtual warehouse query monitoring, dynamic query allocation, and query alerts generation
EP3483733B1 (de) Verfahren und vorrichtung zur erforschung einer marge zur ressourcenoptimierung in einer anwendungskette
WO2023114164A1 (en) Data certification process for cloud database platform
WO2025027632A1 (en) System and method for generating and managing network performance reports

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20240731

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20260109