WO2023029855A1 - 物化视图的创建方法、装置、存储介质及电子设备 - Google Patents

物化视图的创建方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2023029855A1
WO2023029855A1 PCT/CN2022/109471 CN2022109471W WO2023029855A1 WO 2023029855 A1 WO2023029855 A1 WO 2023029855A1 CN 2022109471 W CN2022109471 W CN 2022109471W WO 2023029855 A1 WO2023029855 A1 WO 2023029855A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
materialized view
candidate
queries
preset
Prior art date
Application number
PCT/CN2022/109471
Other languages
English (en)
French (fr)
Inventor
张友军
郭俊
杨诗旻
Original Assignee
北京火山引擎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京火山引擎科技有限公司 filed Critical 北京火山引擎科技有限公司
Publication of WO2023029855A1 publication Critical patent/WO2023029855A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2393Updating materialised views
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a method, device, storage medium, and electronic device for creating a materialized view.
  • MOLAP On-Line Analytical Processing
  • a materialized view is a collection of precalculated results.
  • the materialized view can be directly reused to read the results of the materialized view, avoiding repeated calculations, and improving query efficiency.
  • establishing a materialized view based on a query requires corresponding computing and storage costs. Therefore, how to select an appropriate query to establish a corresponding materialized view is crucial.
  • the present disclosure provides a method for creating a materialized view, including:
  • a materialized view corresponding to the target query is created, wherein the materialized view is used to process the corresponding query.
  • the present disclosure provides a device for creating a materialized view, including:
  • Extraction module used to extract the original subquery in the historical query statement
  • a generating module configured to generate a candidate query set according to preset rules according to the original sub-query
  • a determining module configured to determine a target query from the set of candidate queries, wherein the target query is a query whose revenue is greater than a preset revenue threshold;
  • a creating module configured to create a materialized view corresponding to the target query, wherein the materialized view is used to process the corresponding query.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the above-mentioned first aspect are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method in the first aspect above.
  • Fig. 1 is a flow chart showing a method for creating a materialized view according to an exemplary embodiment of the present disclosure.
  • Fig. 2 is a flow chart of generating a candidate query set according to an exemplary embodiment of the present disclosure.
  • Fig. 3 is a block diagram showing an apparatus for creating a materialized view according to an exemplary embodiment of the present disclosure.
  • Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • creating a materialized view requires computing and storage costs. If the hit rate of the created materialized view is low, the query benefit of the materialized view will be low. Among them, the query benefit passes through the cost (computing cost and storage cost ) and the hit rate is determined. For example, the cost of creating the first materialized view is the same as the cost of the second materialized view, but the hit count of the first materialized view is 1, and the hit count of the second materialized view is 10, then the query benefit of the first materialized view is lower than The query proceeds from the second materialized view.
  • an embodiment of the present disclosure provides a method for creating a materialized view, which uses the original subquery extracted from the historical query statement to construct a candidate query set, determines the target query in the candidate query set, and creates a query corresponding to the target query.
  • materialized view because the target query is a query whose query revenue in the candidate query set is greater than the preset revenue threshold, so the hit rate of the materialized view can be improved; in addition, the materialized view is only established based on the target query, which can reduce the calculation cost of the materialized view and storage costs.
  • a materialized view corresponding to the query statement can be created, so that when subsequent queries corresponding to the query statement are performed, you can directly Get the results by querying the materialized view.
  • Fig. 1 is a flow chart showing a method for creating a materialized view according to an exemplary embodiment of the present disclosure.
  • the creation method of the materialized view includes:
  • Step 101 extracting the original sub-query in the historical query statement.
  • the historical query statement is a historical structured query statement before creating the materialized view, such as an SQL statement.
  • the following uses historical query statements as historical SQL statements to explain this embodiment.
  • a grammatical analysis may be performed on the historical SQL statement to obtain a grammatical tree, and then based on the grammatical tree, a subquery in the grammatical tree may be extracted as an original subquery.
  • the syntax tree is a graphical representation of the sentence structure, which represents the derivation result of the sentence, and is conducive to understanding the level of the sentence's grammatical structure. Therefore, it is convenient to extract all subqueries in the SQL statement by processing the SQL statement into a syntax tree.
  • Step 102 generate a candidate query set according to preset rules according to the original sub-query.
  • the preset rule is to generate a query different from the original sub-query on the basis of the original sub-query.
  • two original sub-queries can be combined to generate a query that distinguishes the query conditions corresponding to the two sub-queries, thereby increasing the diversity of query conditions and further increasing the number of queries.
  • the query that can cover multiple types of query conditions can be generated through the original sub-query, and the hit rate of the materialized view can be improved if such a query is used to create a materialized view.
  • Step 103 determining a target query from the candidate query set, wherein the target query is a query whose query revenue is greater than a preset revenue threshold.
  • target query represents the query scope corresponding to the data that the user cares about.
  • the query revenue may be determined according to the number of queries, the size of the base table, and the parameters of the queried columns in the base table.
  • the parameter may be, for example, a convergence ratio.
  • the convergence ratio can reflect the storage cost of the materialized view. A query with a higher convergence ratio has a lower cost of the corresponding materialized view.
  • Step 104 creating a materialized view corresponding to the target query, wherein the materialized view is used to process the corresponding query.
  • the query result corresponding to the query can be extracted from the materialized view, then the query is the corresponding query of the materialized view.
  • materialized views can be applied in the following ways.
  • the Oracle database When receiving a new query, the Oracle database will automatically determine whether the query result can be obtained by querying the materialized view. If yes, the new query will be rewritten, and the rewritten query statement will be executed. The rewritten query statement will directly Read data from the materialized view that has been calculated, so as to obtain the query results corresponding to the new query from the corresponding materialized view, avoiding aggregation or join operations.
  • the query rewriting is to rewrite the query statement.
  • the related process of judging whether to obtain the query result by querying the materialized view and the related process of query rewriting may refer to related technologies, and the embodiments of the present disclosure will not repeat them here.
  • the hit rate of the materialized view can be improved; in addition, compared with the scheme of establishing a corresponding materialized view based on each subquery, the calculation cost and storage cost of the materialized view can be reduced.
  • the performance of the system also tends to be stable.
  • the original sub-query includes table fields
  • the above step 102 can be implemented in the following manner: according to the table fields of the original sub-query, the original sub-query with the same table field is divided into the same group to obtain multiple original sub-queries set; for each original subquery set, generate a candidate query set according to preset rules according to the original subqueries in the original subquery set.
  • the table field of the original subquery represents the base table corresponding to the data to be queried by the original subquery.
  • the base table includes dimension columns, index columns, and the like.
  • the dimension column can be, for example, employee ID, transaction date, etc.
  • the index column can be salary, etc.
  • the original subquery for example, can be to query the monthly data whose salary of employee 1 is greater than 10,000.
  • the original sub-query further includes: a filter condition field, a dimension column field, an index column field, and a query times field.
  • the filter condition field, the dimension column field, and the index column field represent the query conditions of the original subquery; the query times of the original subquery represent the number of times the original subquery has been queried in history.
  • the divided original subquery set includes two.
  • An original subquery set includes only Querypattern1, Querypattern2, Querypattern3, Querypattern4, and Querypattern5 whose table field is A.
  • Another original subquery set is Querypattern6 and Querypattern7 that only include table fields A and B.
  • the original subqueries are grouped based on the table fields of the original subqueries. Since the original subqueries corresponding to the same table field have high correlation, the selection of the target query is optimized, thereby improving the optimization of the materialized view set Effect.
  • a candidate query set can be generated through the steps in the flow chart shown in Figure 2, referring to Figure 2, including the following steps:
  • Step 201 use the original sub-query set as the current sub-query set.
  • the original subquery set includes Querypattern1, Querypattern2, Querypattern3, Querypattern4, and Querypattern5, and the current subquery set includes Querypattern1, Querypattern2, Querypattern3, Querypattern4, and Querypattern5.
  • Step 202 according to the queries in the current sub-query set, a candidate query set is generated according to preset combining rules.
  • the preset merging rule merges the queries in the current subquery set, for example, Querypattern1 and Querypattern2 can be merged to generate a new query.
  • the query in the candidate query set includes a filter condition field, a dimension column field, an index column field, and a query count field
  • the candidate query set can be generated according to preset merging rules according to the queries in the current sub-query set in the following manner: For every two queries in the current subquery set, merge the two queries according to the filter condition field, dimension column field, index column field and query times field of the two queries; according to the merged result and the current subquery set includes query to generate a candidate query set.
  • the candidate query sets generated for Querypattern1, Querypattern2, Querypattern3, Querypattern4 and Querypattern5 are shown in Table 2 below.
  • indexes 1-5 correspond to Querypattern1, Querypattern2, Querypattern3,
  • the query corresponding to index 6-15 is the query obtained by combining the query patterns corresponding to index 1-5.
  • the filter condition field of the merged query is the intersection of the filter condition fields of the two queries, for example , referring to the filter condition field of the query corresponding to index 6 obtained by combining the queries corresponding to index 1 and index 2. If there is no intersection between the filter condition fields of the two queries to be merged, the filter condition fields of the merged query are a superset of the filter condition fields of the two queries, for example, refer to the corresponding The filter condition field obtained by the query corresponding to the index 10 obtained by querying and merging.
  • the dimension column fields of the merged query are the union of the dimension column fields of the two queries that are merged. For example, refer to the dimension column field of the query corresponding to index 8 obtained by merging the queries corresponding to index 1 and index 4 .
  • the index column fields of the merged query are the union of the index column fields of the two queries that are merged. For example, refer to the index column field of the query corresponding to index 8 obtained by combining the queries corresponding to index 1 and index 4 .
  • the combined query times are the sum of the combined query times of the two queries. For example, refer to the query times of the query corresponding to index 8 obtained by combining the queries corresponding to index 1 and index 4 .
  • Step 203 sort the queries in the candidate query set according to the preset sorting rules, and select a preset number of queries from the sorting results according to the sorting results to form a new candidate query set.
  • the preset sorting rule is to arrange all the queries in the candidate query set according to certain rules.
  • the preset sorting rules can be sorted according to the filter condition field, the dimension column field and the query times field.
  • the queries in the candidate query set can be arranged in the following arrangement:
  • the hit rate can be improved.
  • sorting order determines the importance of generating materialized views for different queries
  • the order of each sorting rule is very important when there are multiple sorting rules. Based on this, considering the strength of various sorting rules when sorting, sorting according to the sorting order (dimension column-query times-filtering conditions) set above ensures that the optimal query can be screened to the greatest extent.
  • a new candidate query set is formed including queries corresponding to index 5, index 3, index 14, index 9 and index 7 respectively.
  • Step 204 judging whether a preset iteration condition is satisfied.
  • step 205 is performed: outputting the current candidate query set. If it is determined that the preset iteration condition is not satisfied, execute step 206: use the new candidate query set as the new current sub-query set, and return to execute step 202.
  • the preset iteration condition may be whether the number of iterations reaches a preset number.
  • the number of executions of step 202 may be counted, and step 204 may be: determine whether the number of executions of step 202 reaches a preset number of times. And when the number of executions of step 202 reaches the preset number of times, it is determined that the preset iteration condition is satisfied; when the number of executions of step 202 does not reach the preset number of times, it is determined that the preset iteration condition is not satisfied.
  • preset times may be set according to actual conditions, which is not limited in this embodiment.
  • a new candidate query set includes queries corresponding to index 5 , index 3 , index 14 , index 9 and index 7 respectively.
  • the output current candidate query set is a candidate query set composed of queries corresponding to index 5, index 3, index 14, index 9 and index 7 respectively;
  • a candidate query set composed of queries corresponding to index 5, index 3, index 14, index 9, and index 7 is used as the current sub-query set, and step 202 is executed.
  • the queries in the candidate query set are continuously optimized in an iterative manner, so as to improve the hit rate of the materialized view established based on the queries in the optimized candidate query set.
  • step 103 shown in FIG. 1 can be implemented in the following manner: calculating the index parameter of each query in the candidate query set; selecting the index parameter to satisfy the corresponding index parameter according to the index parameter of each query in the candidate query set The query with preset parameter conditions is used as the target query.
  • the target query is selected from the candidate query set generated in the last iteration.
  • the index parameter may include the data volume of the base table corresponding to the query in the candidate query set, the query times of the query, and the convergence ratio of the query. Considering that the cost of querying tables with small data is not high, there is no need to establish a materialized view for queries corresponding to such tables.
  • the query times of the query may reflect the hit rate of the materialized view corresponding to the subsequent query. Therefore, a query with a higher number of queries is more suitable for establishing a materialized view.
  • the data volume of the base table can be obtained by reading the storage space occupied by the base table in the database.
  • the query count of the query can be obtained through the query count field of the query.
  • the convergence ratio may be determined based on the number of rows of the dimension column corresponding to the dimension column field, and the number of rows may be understood as the number of values of the dimension column.
  • the convergence ratio may be the ratio of the number of rows before calculation to the number of rows after calculation.
  • the number of rows in the base table corresponding to the a dimension column represented by the dimension column field is 1000 rows (that is, the number of rows before calculation), there are 1000 values, and
  • the number of rows of the calculated query result corresponding to the query with an index of 5 is 10 rows (that is, the number of rows after calculation), that is, the query result corresponds to only 10 values of the a-dimension column, and the convergence ratio at this time is 10.
  • the ratio of the sum of the convergence ratios corresponding to each dimension column to the number of dimension columns may be determined as the convergence rate corresponding to the query. Compare.
  • the target query is a query in which all index parameters satisfy the respective corresponding preset parameter conditions.
  • the data volume in the base table is less than the preset data volume threshold, and the number of queries is greater than the preset query times threshold
  • a query whose convergence ratio is greater than a preset convergence ratio threshold is a target query.
  • the preset data volume threshold, the preset query times threshold, and the preset convergence ratio threshold can be set according to actual conditions. This embodiment is not limited here.
  • multi-dimensional index parameters are used to comprehensively select the target query, and the target selection is more comprehensive, which improves the optimization effect of the target query, thereby improving the quality of the materialized view, and provides reliable support for subsequent applications based on materialized views. data base.
  • step 104 shown in FIG. 1 can be implemented in the following manner: setting the data period of the materialized view corresponding to the target query, wherein the materialized view is included in the base table corresponding to the table field of the target query and Data corresponding to the data period; generate a data definition language according to the target query and the data period; execute the data definition language to create a materialized view corresponding to the target query.
  • the data period represents the data range of the initialization data of the materialized view.
  • the data period may be data that meets the query condition within 2 days. Based on this, the materialized view created includes two days' worth of data.
  • the materialized view will be updated based on the data update of the base table.
  • the data volume of the materialized view increases, resulting in an increase in the storage cost of the materialized view.
  • the query based on this materialized view It also tends to query the most recent data. Therefore, the probability that the data stored earlier in the materialized view will not be hit increases. Based on this, you can set the data period to ensure that expired data in the materialized view are discarded, thereby reducing storage costs.
  • the initially created materialized view includes data with a data period of 3 days (for example, No. 1-3).
  • the relevant In the prior art the data in No. 4 that satisfies the filter condition of the query corresponding to the materialized view is updated to the materialized view.
  • the data of No. 1-4 are included in the materialized view.
  • the hit rate for the No. 1 data in the materialized view drops.
  • the storage cost of the materialized view will be greatly increased.
  • the storage cost of the materialized view can be reduced under the condition that the related data of the latest 3 days can be queried.
  • the data period can be determined based on the historical records corresponding to the target query.
  • the history records characterize the period of data required for each query.
  • an update task may be created while creating the materialized view for updating the materialized view.
  • the data definition language may be an executable SQL statement. Based on this type of statement, the creation of a materialized view is realized.
  • the method further includes: detecting whether the materialized view belongs to an invalid view; if it is detected that the materialized view belongs to the invalid view, deleting the materialized view.
  • invalid views may be materialized views that are not hit within a preset time period. For example, when it is detected that the materialized view is not hit within 3 days, it can be determined that the materialized view is an invalid view.
  • the failure view may also be a materialized view that misses a preset number of times within a preset time period.
  • the preset duration can be set according to the actual situation, and the preset duration can be 1 day or 2 days. This embodiment is not limited here.
  • the invalidated view may be a materialized view whose corresponding base table has been deleted.
  • the query result included in the materialized view is the data in the base table A, and when it is detected that the base table A is deleted from the storage, it can be determined that the materialized view is an invalid view.
  • the invalidation view may be a materialized view in which the corresponding base table data is changed.
  • the query result included in the materialized view is the data in the base table A, and when a part of the data in the base table A is detected to change, it can be determined that the materialized view is an invalid view.
  • the materialized view can be deleted when the data in the base table corresponding to the materialized view changes, so as to avoid querying the materialized view based on data errors to obtain wrong query results.
  • the aforementioned detection can be implemented based on the corresponding monitoring task created for the materialized view.
  • deleting the materialized view may be implemented by generating a corresponding SQL execution statement.
  • the tasks created for the materialized view may also be deleted, for example, updating tasks and monitoring tasks.
  • an embodiment of the present disclosure provides an apparatus for creating a materialized view.
  • the creation apparatus 300 includes:
  • An extraction module 301 configured to extract the original sub-query in the historical query statement
  • a generating module 302 configured to generate a candidate query set according to preset rules according to the original sub-query
  • a determining module 303 configured to determine a target query from the candidate query set, wherein the target query is a query whose query revenue is greater than a preset revenue threshold;
  • the creating module 304 is configured to create a materialized view corresponding to the target query, wherein the materialized view is used to process the corresponding query.
  • the original sub-query includes table fields
  • the generating module 302 includes:
  • the grouping submodule is used to divide the original subquery with the same table field into the same group according to the table fields of the original subquery, so as to obtain multiple original subquery sets;
  • the generation sub-module is used for generating candidate query sets according to preset rules for each original sub-query set according to the original sub-queries in the original sub-query set.
  • the generating submodule includes:
  • the first determining subunit is configured to use the original sub-query set as the current sub-query set;
  • a merging subunit configured to generate a candidate query set according to preset merging rules according to the queries in the current sub-query set;
  • a sorting subunit configured to sort the queries in the candidate query set according to a preset sorting rule; and according to the sorting result, select a preset number of queries from the sorting result to form a new candidate query set;
  • the second determining subunit is configured to use the new candidate query set as the new current sub-query set when it is determined that the preset iteration condition is not satisfied.
  • the queries in the candidate query set include a filter condition field, a dimension column field, an index column field, and a query count field
  • the merging subunit is specifically configured to target every two queries in the current sub-query set, according to The filter condition field, the dimension column field, the index column field and the query count field of the two queries are merged; and a candidate query set is generated according to the merged result and the queries included in the current sub-query set.
  • the determining module 303 includes:
  • a calculation submodule configured to calculate the index parameter of each query in the candidate query set
  • the selection sub-module is configured to, according to the index parameter of each query in the candidate query set, select a query whose index parameter satisfies a preset parameter condition corresponding to the index parameter as a target query.
  • the creation module 304 includes:
  • a setting submodule configured to set a data period of a materialized view corresponding to the target query, wherein the materialized view includes data corresponding to the data period in a base table corresponding to a table field of the target query ;
  • a statement generating submodule configured to generate a data definition language according to the target query and the data period
  • the execution sub-module is used to execute the data definition language to create a materialized view corresponding to the target query.
  • the creating device 300 also includes:
  • a detection module configured to detect whether the materialized view belongs to an invalid view
  • a deletion module configured to delete the materialized view when it is detected that the materialized view belongs to the invalid view.
  • an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the creation method described in the method embodiment are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device, so as to implement the steps of the creation method in the method embodiments.
  • FIG. 4 it shows a schematic structural diagram of an electronic device 400 suitable for implementing an embodiment of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 401, which may be randomly accessed according to a program stored in a read-only memory (ROM) 402 or loaded from a storage device 408.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM read-only memory
  • various appropriate actions and processes are executed by programs in the memory (RAM) 403 .
  • RAM 403 In the RAM 403, various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to bus 404 .
  • the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 407 such as a computer; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 4 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 409, or from storage means 408, or from ROM 402.
  • the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the electronic device can communicate with any currently known or future-developed network protocol such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium (for example, communication networks) interconnect.
  • network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol)
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: extracts the original sub-query in the historical query statement; according to the original sub-query, according to generating a candidate query set according to preset rules; determining a target query from the candidate query set, wherein the target query is a query whose query revenue is greater than a preset revenue threshold; creating a materialized view corresponding to the target query, wherein the Materialized views are used to process the corresponding queries.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connected via the Internet.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the extraction module may also be described as "a module for extracting original subqueries in historical query statements".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a method for creating a materialized view, including:
  • a materialized view corresponding to the target query is created, wherein the materialized view is used to process the corresponding query.
  • Example 2 provides the method of Example 1, wherein the original subquery includes table fields, and according to the original subquery, generating a candidate query set according to preset rules includes:
  • the original sub-query with the same table field is divided into the same group to obtain a plurality of original sub-query sets;
  • a candidate query set is generated according to preset rules according to the original sub-queries in the original sub-query set.
  • Example 3 provides the method of Example 2, wherein for each of the original sub-query sets, according to the original sub-queries in the original sub-query set, candidate queries are generated according to preset rules set, including:
  • Example 4 provides the method of Example 3, the queries in the candidate query set include a filter condition field, a dimension column field, an index column field, and a query times field, and the current Queries in the sub-query set generate candidate query sets according to preset merging rules, including:
  • a candidate query set is generated according to the combination result and the queries included in the current sub-query set.
  • Example 5 provides the method of Example 1, the determining the target query from the candidate query set includes:
  • a query whose index parameter satisfies a preset parameter condition corresponding to the index parameter is selected as a target query.
  • Example 6 provides the method of any one of Examples 1-5, the creating a materialized view corresponding to the target query includes:
  • the data definition language is executed to create a materialized view corresponding to the target query.
  • Example 7 provides the method of any one of Examples 1-5, the method further comprising:
  • the materialized view is deleted.
  • Example 8 provides an apparatus for creating a materialized view, including:
  • Extraction module used to extract the original subquery in the historical query statement
  • a generating module configured to generate a candidate query set according to preset rules according to the original sub-query
  • a determining module configured to determine a target query from the set of candidate queries, wherein the target query is a query whose revenue is greater than a preset revenue threshold;
  • a creating module configured to create a materialized view corresponding to the target query, wherein the materialized view is used to process the corresponding query.
  • Example 9 provides a computer-readable medium, on which a computer program is stored, characterized in that, when the program is executed by a processing device, the method described in any one of Examples 1-7 is implemented A step of.
  • Example 10 provides an electronic device, characterized in that it includes:
  • a processing device configured to execute the computer program in the storage device to implement the steps of any one of the methods in Examples 1-7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种物化视图的创建方法、装置、存储介质及电子设备,方法包括:提取历史查询语句中的原始子查询;根据所述原始子查询,按照预设规则生成候选查询集;从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询,由于目标查询是候选查询集中查询收益大于预设收益阈值的查询,如此,便可以提高物化视图的命中率;此外,只基于目标查询建立物化视图,可以降低物化视图的计算成本和存储成本。

Description

物化视图的创建方法、装置、存储介质及电子设备
相关申请的交叉引用
本申请基于申请号为202111032832.8、申请日为2021年09月03日,名称为“物化视图的创建方法、装置、存储介质及电子设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及计算机技术领域,具体地,涉及一种物化视图的创建方法、装置、存储介质及电子设备。
背景技术
在数据分析领域中,联机分析处理技术(On-Line Analytical Processing,MOLAP)占据着非常重要的地位,它通过基于预计算的数据查询技术极大地缩短了数据分析的响应时间,降低了数据规模对于查询效率的影响。
然而,联机分析处理技术的应用需要依赖于物化视图。物化视图是预计算结果的集合,在遇到与物化视图对应的查询时,可以对物化视图直接进行复用,读取物化视图的结果,避免重复计算,以达到提升查询效率的目的。此外,基于查询的建立物化视图需要相应的计算成本和存储成本,因此,如何选取合适的查询来建立相应的物化视图是至关重要的。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种物化视图的创建方法,包括:
提取历史查询语句中的原始子查询;
根据所述原始子查询,按照预设规则生成候选查询集;
从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
第二方面,本公开提供一种物化视图的创建装置,包括:
提取模块,用于提取历史查询语句中的原始子查询;
生成模块,用于根据所述原始子查询,按照预设规则生成候选查询集;
确定模块,用于从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
创建模块,用于创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现上述第一方面中所述方法的步骤。
第四方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现上述第一方面中所述方法的步骤。
通过上述技术方案,利用从历史查询语句中提取到的原始子查询来构建候选查询集,在构建候选查询集中确定目标查询,并创建与目标查询对应的物化视图,由于目标查询是候选查询集中查询收益大于预设收益阈值的查询,如此,便可以提高物化视图的命中率;此外,只基于目标查询建立物化视图,可以降低物化视图的计算成本和存储成本。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据本公开一示例性实施例示出的一种物化视图的创建方法的流程图。
图2是根据本公开一示例性实施例示出的一种生成候选查询集的流程图。
图3是根据本公开一示例性实施例示出的一种物化视图的创建装置的框图。
图4是根据本公开一示例性实施例示出的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
可以理解的是,创建一个物化视图需要计算成本和存储成本,若创建的物化视图的命中率较低,将导致该物化视图的查询收益较低,其中,查询收益通过成本(计算成本和存储成本)和命中率确定。例如,创建第一物化视图的成本与第二物化视图的成本相同,但第一物化视图的命中次数为1,第二物化视图的命中次数为10,则第一物化视图的查询收益是低于第二物化视图的查询收益的。
在相关技术中,针对查询语句的所有子查询若均创建一对应的物化视图,由于子查询较多,但某些子查询对应的物化视图的命中率降低,导致物化视图的成本较高。因此,如何确定能囊括大部分高频查询语句的查询,并以此来建立高命中率的物化视图是至关重要的。
有鉴于此,本公开实施例提供一种物化视图的创建方法,利用从历史查询语句中提取到的原始子查询来构建候选查询集,在构建候选查询集中确定目标查询,并创建与目标查询对应的物化视图,由于目标查询是候选查询集中查询收益大于预设收益阈值的查询,如此,便可以提高物化视图的命中率;此外,只基于目标查询建立物化视图,可以降低物化视图的计算成本和存储成本。
为了帮助理解本公开实施例的物化视图的创建方法,在对本公开实施例进行详细说明之前,先对本公开实施例中可能的应用场景进行说明。
在一些可能的应用场景下,当需要多次计算相同的查询语句时,针对此类查询,可以创建与该查询语句对应的物化视图,便于在后续进行与该查询语句对应的查询时,可以直接通过查询物化视图来得到结果。
图1是根据本公开一示例性实施例示出的一种物化视图的创建方法的流程图。参照图1,该物化视图的创建方法包括:
步骤101,提取历史查询语句中的原始子查询。
可以理解的是,在提取原始子查询之前,需要获取历史查询语句,查询语句为附带有查询条件的语句,确定了用户需要查询的范围。
需要说明的是,历史查询语句为创建物化视图之前的历史结构化查询语句,例如SQL语句。以下以历史查询语句为历史SQL语句对本实施例进行解释说明。
在一些实施例中,可以对历史SQL语句进行语法解析,得到语法树,再基于该语法树,提取该语法树中的子查询作为原始子查询。其中,语法树是句子结构的图形表示,它代表了句子的推导结果,有利于理解句子语法结构的层次。因此,通过将SQL语句处理成语法树便于提取SQL语句中所有的子查询。
步骤102,根据原始子查询,按照预设规则生成候选查询集。
其中,预设规则是为了在原始子查询的基础上,生成区别于原始子查询的查询。
在一些实施例中,可以将两两原始子查询进行合并,生成一种区别与该两个子查询所对应的查询条件的查询,增加查询条件的多样化,进而增加查询的数量。
需要说明的是,通过原始子查询可以生成可以覆盖多类查询条件的查询,若以此类查询建立物化视图,可以提高物化视图的命中率。
步骤103,从候选查询集中确定目标查询,其中,目标查询是查询收益大于预设收益阈值的查询。
需要说明的是,目标查询表征了用户关心的数据所对应的查询范围。
在一些实施例中,查询收益可以是根据查询次数、基表的大小以及基表中被查询的列的参数所确定。其中,该参数例如可以是收敛比。收敛比可以反映物化视图的存储成本,收敛比越高的查询,其对应的物化视图的成本越低。
步骤104,创建与目标查询对应的物化视图,其中,物化视图用于处理相应的查询。
需要说明的是,针对某个查询,可以从物化视图中抽取到与该查询对应的查询结果,则该查询则为该物化视图的相应查询。
在一些实施例中,可以采用以下方式对物化视图进行应用。在接收到新查询时,Oracle数据库会自动判断能否通过查询物化视图来得到查询结果,如果可以,则将新查询进行查询重写,执行重写后的查询语句,重写后的查询语句直接从已经计算好的物化视图中读取数据,以便于从相应地物化视图中得到新查询对应的查询结果,避免了聚集或连接操作。
其中,查询重写是对查询语句进行重写。另外,判断能否通过查询物化视图来得到查询结果的相关过程与查询重写的相关过程可以参照相关技术,本公开实施例在此不做赘述。
通过上述方案,利用从历史查询语句中提取到的原始子查询来构建候选查询集,在构建候选查询集中确定目标查询,并创建与目标查询对应的物化视图,由于目标查询是候选查询集中查询收益大于预设收益阈值的查询,如此,便可以提高物化视图的命中率;此外,相较于基于每种子查询均建立对应的物化视图的方案来说可以降低物化视图的计算成本和存储成本,在降低计算成本和存储成本的前提上,系统的性能也趋于稳定。
为了使得本领域技术人员更加理解本公开提供的物化视图的创建方法,下面对上述各步骤进行详细举例说明。
在一些实施例中,原始子查询包括表字段,上述步骤102可以通过以下方式实施:根据原始子查询的表字段,将具有相同表字段的原始子查询划分为同一组,得到多个原始子查询集;针对每个原始子查询集,根据该原始子查询集中的原始子查询,按照预设规则生成候选查询集。
需要说明的是,原始子查询的表字段表征该原始子查询想要查询的数据所对应的基表。其中,基表中包括维度列、指标列等。在一些应用场景下,维度列例如可以是员工标识、交易日期等,指标列可以是薪水等,相应的,原始子查询例如可以是查询员工1的薪水大于10000的月份数据。
在一些实施例中,原始子查询还包括:过滤条件字段、维度列字段、指标列字段和查询次数字段。其中,过滤条件字段、维度列字段、指标列字段表征原始子查询的查询条件;原始子查询的查询次数表征该原始子查询历史查询过的次数。
如表1所示,包括提取的原始子查询以及与原始子查询相关的字段说明,根据表1中所示的各原始子查询和各原始子查询对应的表字段,划分的原始子查询集包括两个。一个原始子查询集是仅包括表字段为A的Querypattern1,Querypattern2,Querypattern3,Querypattern4以及Querypattern5。另一个原始子查询集是仅包括表字段为A和B的Querypattern6以及Querypattern7。
Figure PCTCN2022109471-appb-000001
表1
通过上述方式,因此,基于原始子查询的表字段对原始子查询进行分组,由于同一表字段对应的原始子查询的相关性较高,优化了目标查询的选择,进而提高了物化视图集的优化效果。
在一些实施例中,以表1所确定的原始子查询集包括Querypattern1,Querypattern2,Querypattern3,Querypattern4以及Querypattern5为例,并结合图2示出的一种生成候选查询集的流程图,对候选查询集的生成过程进行解释说明。针对每个原始子查询集,可以通过图2所示的流程图中的步骤生成候选查询集,参照图2,包括以下步骤:
步骤201,将原始子查询集作为当前子查询集。
示例地,原始子查询集包括Querypattern1,Querypattern2,Querypattern3,Querypattern4以及Querypattern5,则当前子查询集包括Querypattern1,Querypattern2,Querypattern3,Querypattern4以及Querypattern5。
步骤202,根据当前子查询集中的查询,按照预设合并规则生成候选查询集。
需要说明的是,预设合并规则针对当前子查询集中的查询进行合并,例如,针对Querypattern1和Querypattern2,可以合并生成一个新的查询。
在一些实施例中,候选查询集中的查询包括过滤条件字段、维度列字段、指标列字段和查询次数字段,可以采用以下方式根据当前子查询集中的查询,按照预设合并规则生成候选查询集:针对当前子查询集中的每两个查询,根据该两个查询的过滤条件字段、维度列字段、指标列字段和查询次数字段,对该两个查询进行合并;根据合并结果和当前子查询集中包括的查询,生成候选查询集。
示例性地,针对Querypattern1,Querypattern2,Querypattern3,Querypattern4以及Querypattern5生成的候选查询集如下表2所示。
索引 过滤条件字段 维度列字段 指标列字段 查询次数
1 d>0,e<10 a,b f,g 10
2 d>0 a,b f,h 5
3 e<10 a f 15
4 d>0 b,c f,h,i 20
5 e>0 a f,i 20
6(1&2) d>0 a,b f,g,h 15(10+5)
7(1&3) e<10 a,b f,g 25(10+15)
8(1&4) d>0 a,b c f,g,h,i 30(10+20)
9(1&5) e>0 a,b f,g,i 30(10+20)
10(2&3) / a,b,d,e f,h 20(5+15)
11(2&4) d>0 a,b c f,h,i 25(5+20)
12(2&5) / a,b,d,e f,h,i 25(5+20)
13(3&4) / a,b,c,d,e f,h,i 35(15+20)
14(3&5 / a,e f,i 35(15+20)
15(4&5) / a,b,c,d,e f,h,i 40(20+20)
表2
需要说明的是,索引1-5分别对应Querypattern1,Querypattern2,Querypattern3,
Querypattern4以及Querypattern5。索引6-15对应的查询是索引1-5所对应的Querypattern两两合并得到的查询。
针对合并得到的查询的过滤条件字段来说,在进行合并的两个查询的过滤条件字段存在交集的情况下,合并得到的查询的过滤条件字段取该两个查询的过滤条件字段的交集, 例如,参照索引1和索引2所对应的查询合并得到的索引6所对应的查询的过滤条件字段。在进行合并的两个查询的过滤条件字段不存在交集的情况下,合并得到的查询的过滤条件字段取该两个查询的过滤条件字段的超集,例如,参照索引2和索引3所对应的查询合并得到的索引10所对应的查询得到的过滤条件字段。
针对合并得到的查询的维度列字段来说,合并得到的查询的维度列字段是进行合并的两个查询的维度列字段的并集。例如,参照索引1和索引4所对应的查询合并得到的索引8所对应的查询的维度列字段。
针对合并得到的查询的指标列字段来说,合并得到的查询的指标列字段是进行合并的两个查询的指标列字段的并集。例如,参照索引1和索引4所对应的查询合并得到的索引8所对应的查询的指标列字段。
针对合并得到的查询的查询次数来说,合并得到的查询的查询次数是进行合并的两个查询的查询次数的和。例如,参照索引1和索引4所对应的查询合并得到的索引8所对应的查询的查询次数。
步骤203,对候选查询集中的查询按照预设排序规则进行排序,并根据排序结果,从排序结果中选择预设数量的查询,组成新的候选查询集。
需要说明的是,预设排序规则是将候选查询集中的所有查询按照一定的规则进行排列。
在一些实施例中,预设排列规则可以根据过滤条件字段、维度列字段和查询次数字段进行排序。
进一步地,在预设排列规则包括根据过滤条件字段、维度列字段和查询次数字段设置的规则的情况下,在一些实施例中,可以按照以下排列方式对候选查询集中的查询进行排列:
将候选查询集中的查询按照维度列的个数进行升序排列,得到第一排序结果;在第一排序结果基础上,按照查询次数对第一排序结果进行降序排列,得到第二排序结果;在第二排序结果基础上,按照过滤条件的条件个数对第二排序结果进行升序排序,得到最终的排序结果。示例地,按照此排序方式对表2所示的候选查询集进行排列后的排序结果如下表3所示。
索引 过滤条件字段 维度列字段 指标列字段 查询次数
5 e>0 a f,i 20
3 e<10 a f 15
14(3&5 / a,e f,i 35(15+20)
9(1&5) e>0 a,b f,g,i 30(10+20)
7(1&3) e<10 a,b f,g 25(10+15)
4 d>0 b,c f,h,i 20
6(1&2) d>0 a,b f,g,h 15(10+5)
1 d>0,e<10 a,b f,g 10
2 d>0 a,b f,h 5
8(1&4) d>0 a,b c f,g,h,i 30(10+20)
11(2&4) d>0 a,b c f,h,i 25(5+20)
12(2&5) / a,b,d,e f,h,i 25(5+20)
10(2&3) / a,b,d,e f,h 20(5+15)
15(4&5) / a,b,c,d,e f,h,i 40(20+20)
13(3&4) / a,b,c,d,e f,h,i 35(15+20)
表3
应当说明的是,维度列的个数越少,预期收敛比越高,所需存储成本越低;查询次数越高,则预期的命中率越高;过滤条件的条件个数越少,则覆盖的查询场景越多,从而能够提升命中率。
考虑到由于排列顺序决定了不同查询需要生成物化视图的重要性,因此,在存在多种排列规则的前提下,各个排列规则的顺序显得至关重要。基于此,考虑到排列时各类排列规则的强弱,按照上述设置的排序顺序(维度列-查询次数-过滤条件)进行排序,最大程度保证了可以筛选到最优的查询。
在一些实施例中,以预设数量为5为例,根据表3的排序结果,组成新的候选查询集包括索引5、索引3、索引14、索引9和索引7所分别对应的查询。
步骤204,判断是否满足预设迭代条件。在确定满足预设迭代条件的情况下,执行步骤205:输出当前的候选查询集。在确定不满足预设迭代条件的情况下,执行步骤206:将新的候选查询集作为新的当前子查询集,并返回执行步骤202。
在一些实施例中,预设迭代条件可以是迭代次数是否达到预设次数。
在一些实施例中,可以统计步骤202的执行次数,步骤204可以为:判断步骤202的执行次数是否达到预设次数。且在步骤202的执行次数达到预设次数的情况下,确定满足预设迭代条件;在步骤202的执行次数未达到预设次数的情况下,确定不满足预设迭代条件。
需要说明的是,预设次数可以根据实际情况进行设定,本实施例在此不作限定。
示例地,以组成新的候选查询集包括索引5、索引3、索引14、索引9和索引7所分别对应的查询。在确定满足预设迭代条件的情况下,输出的当前的候选查询集则是包括索引5、索引3、索引14、索引9和索引7所分别对应的查询组成的候选查询集;在确定不满足预设迭代条件的情况下,以包括索引5、索引3、索引14、索引9和索引7所分别对应的查询组成的候选查询集作为当前子查询集,并执行步骤202。
通过上述方式,采用迭代的方式来不断优化候选查询集中的查询,以此来提高依据优化的候选查询集中的查询所建立的物化视图的命中率。
在一些实施例中,图1所示的步骤103可以通过以下方式实施:计算候选查询集中每个查询的指标参数;根据候选查询集中每个查询的指标参数,选取指标参数满足与指标参数对应的预设参数条件的查询作为目标查询。
需要说明的是,目标查询是从最后迭代生成的候选查询集中选取的。
在一些实施例中,指标参数可以包括候选查询集中查询所对应的基表的数据量、查询的查询次数和查询的收敛比。考虑到数据小的表查询所需要的成本并不高,因此,针对此类表对应的查询可以无需建立物化视图。查询的查询次数可以反映后续该查询对应的物化视图的命中率。因此,查询次数越高的查询越适合建立物化视图。
在一些实施例中,可以通过读取数据库中基表所占的存储空间获取到基表的数据量。
在一些实施例中,查询的查询次数可以通过该查询的查询次数字段获取。
在一些实施例中,收敛比可以基于维度列字段对应的维度列的行数确定,该行数可以理解为该维度列的取值个数。示例地,收敛比可以为计算前的行数与计算后的行数的比值。例如,针对表3中索引为5的查询,若维度列字段表征的a维度列在基表中的对应的行数为1000行(即计算前的行数),即存在1000个取值,且索引为5的查询对应的计算出来的查询结果的行数为10行(即计算后的行数),即查询结果对应得到的a维度列的取值只有10个,则此时收敛比则为10。
在一些实施例中,若候选查询集中的同一个查询对应的维度列为多个,则可以将每个维度列对应的收敛比之和与维度列的个数的比值确定为该查询对应的收敛比。
在一些实施例中,若指标参数包括多个,则目标查询是所有指标参数均满足分别对应的预设参数条件的查询。
示例地,以指标参数包括查询所对应的基表的数据量、查询的查询次数以及查询的收敛比为例,在基表的数据量小于预设数据量阈值、查询次数大于预设查询次数阈值且收敛比大于预设收敛比阈值的查询为目标查询。
其中,预设数据量阈值、预设查询次数阈值以及预设收敛比阈值可以根据实际情况进行设定。本实施例在此不作限定。
通过上述方式,采用多维度的指标参数来综合选择目标查询,更为全面的选取目标选择,提高了目标查询的优化效果,进而提高了物化视图的质量,对于后续基于物化视图的应用提供了可靠的数据基础。
在一些实施例中,图1所示的步骤104可以通过以下方式实施:设置与目标查询对应的物化视图的数据时段,其中,物化视图包括在与目标查询的表字段对应的基表中的与数据时段对应的数据;根据目标查询和数据时段,生成数据定义语言;执行数据定义语言,以创建与目标查询对应的物化视图。
需要说明的是,数据时段表征物化视图的初始化数据的数据范围。例如,数据时段可以是2天内满足查询条件的数据。基于此,创建的物化视图包括两天的数据。
考虑到在相关物化视图的应用场景中,会基于基表的数据更新对物化视图执行更新。特别是随日期更新的数据场景,随着日期的不断延长,为更新数据,物化视图的数据量增大,导致物化视图的存储成本增加,但基于该类物化视图,若基于此物化视图的查询又偏向于查询最近数据,因此,物化视图中越先存入的数据未被命中的概率增加。基于此,可以通过设置数据时段确保对物化视图中过期数据进行舍弃,以此降低存储成本。
示例地,在用户偏好查询三天内的数据的应用场景下,初始化创建的物化视图包括数据时段为3天(例如1号-3号)的数据,在基表更新4号的数据后,在相关现有技术会将4号中满足该物化视图对应的查询的筛选条件的数据更新至物化视图中,此时,物化视图中包括1号-4号的数据。但由于用户偏向于查询最近三天的数据,因此,对于该物化视图中的1号数据的命中率下降。随着日期的不断延长,将极大的增加物化视图的存储成本。但采用本公开实施例提高的方案,可以基于设置的数据时段将物化视图中属于1号的数据删除,并将4号中满足该物化视图对应的查询的筛选条件的数据更新至物化视图中,如此,在可以满足查询最近3天的相关数据的情况下降低物化视图的存储成本。
可以理解的是,数据时段可以基于目标查询对应的历史记录来确定。该历史记录表征了每次查询所需要的数据的时段。
示例地,可以在创建物化视图的同时创建更新任务,用于物化视图的更新。
示例地,数据定义语言可以是可执行的SQL语句。基于该类语句,实现物化视图的创建。
在一些实施例中,方法还包括:检测物化视图是否属于失效视图;在检测到物化视图属于所述失效视图的情况下,删除物化视图。
在一些实施例中,失效视图可以是预设时长内未命中的物化视图。例如,在检测到物化视图在3天内未被命中时,即可确定该物化视图为失效视图。
在一些实施例中,失效视图还可以是预设时长内未命中预设次数的物化视图。
其中,预设时长可以根据实际情况进行设定,预设时长可以是1天,也可以是2天。本实施例在此不作限定。
在一些实施例中,失效视图可以是对应的基表已经删除的物化视图。例如,物化视图包括的查询结果是基表A中的数据,在检测到基表A从存储中删除时,即可确定该物化视图为失效视图。
在一些实施例中,失效视图可以是对应的基表数据发生变更的物化视图。例如,物化视图包括的查询结果是基表A中的数据,在检测到基表A部分数据发生变化时,即可确定该物化视图为失效视图。考虑到为确保数据的一致性,可以在物化视图对应的基表中的数据发生变化时,删除物化视图,避免查询基于数据错误的物化视图得到错误的查询结果。
在一些实施例中,前述的检测可以基于为物化视图创建的相应的监控任务实现。
在一些实施例中,删除物化视图可以通过生成相应的SQL执行语句实现。在一些实施例中,在删除物化视图后,还可以将为该物化视图所创建的任务删除,例如,更新任务,监控任务。
基于同一发明构思,本公开实施例提供一种物化视图的创建装置,参照图3,该创建装置300包括:
提取模块301,用于提取历史查询语句中的原始子查询;
生成模块302,用于根据所述原始子查询,按照预设规则生成候选查询集;
确定模块303,用于从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
创建模块304,用于创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
可选地,所述原始子查询包括表字段,所述生成模块302包括:
分组子模块,用于根据所述原始子查询的表字段,将具有相同表字段的原始子查询划分为同一组,得到多个原始子查询集;
生成子模块,用于针对每个所述原始子查询集,根据该原始子查询集中的原始子查询,按照预设规则生成候选查询集。
可选地,所述生成子模块包括:
第一确定子单元,用于将所述原始子查询集作为当前子查询集;
合并子单元,用于根据所述当前子查询集中的查询,按照预设合并规则生成候选查询集;
排序子单元,用于对所述候选查询集中的查询按照预设排序规则进行排序;并根据排序结果,从所述排序结果中选择预设数量的查询,组成新的候选查询集;
第二确定子单元,用于在确定不满足预设迭代条件的情况下,将新的候选查询集作为新的当前子查询集。
可选地,所述候选查询集中的查询包括过滤条件字段、维度列字段、指标列字段和查询次数字段,所述合并子单元具体用于针对所述当前子查询集中的每两个查询,根据该两个查询的过滤条件字段、维度列字段、指标列字段和查询次数字段,对该两个查询进行合并;根据合并结果和所述当前子查询集中包括的查询,生成候选查询集。
可选地,所述确定模块303包括:
计算子模块,用于计算所述候选查询集中每个查询的指标参数;
选取子模块,用于根据所述候选查询集中每个查询的指标参数,选取所述指标参数满足与所述指标参数对应的预设参数条件的查询作为目标查询。
可选地,所述创建模块304包括:
设置子模块,用于设置与所述目标查询对应的物化视图的数据时段,其中,所述物化视图包括在与所述目标查询的表字段对应的基表中的与所述数据时段对应的数据;
语句生成子模块,用于根据所述目标查询和所述数据时段,生成数据定义语言;
执行子模块,用于执行所述数据定义语言,以创建与所述目标查询对应的物化视图。
可选地,所述创建装置300还包括:
检测模块,用于检测所述物化视图是否属于失效视图;
删除模块,用于在检测到所述物化视图属于所述失效视图的情况下,删除所述物化视图。
基于同一发明构思,本公开实施例提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现方法实施例中所述创建方法的步骤。
基于同一发明构思,本公开实施例提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现方法实施例中所述创建方法的步骤。
下面参考图4,其示出了适于用来实现本公开实施例的电子设备400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储装置408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施例中,电子设备可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:提取历史查询语句中的原始子查询;根据所述原始子查询,按照预设规则生成候选查询集;从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,提取模块还可以被描述为“提取历史查询语句中的原始子查询的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种物化视图的创建方法,包括:
提取历史查询语句中的原始子查询;
根据所述原始子查询,按照预设规则生成候选查询集;
从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述原始子查询包括表字段,所述根据所述原始子查询,按照预设规则生成候选查询集,包括:
根据所述原始子查询的表字段,将具有相同表字段的原始子查询划分为同一组,得到多个原始子查询集;
针对每个所述原始子查询集,根据该原始子查询集中的原始子查询,按照预设规则生成候选查询集。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,所述针对每个所述原始子查询集,根据该原始子查询集中的原始子查询,按照预设规则生成候选查询集,包括:
针对每个所述原始子查询集,执行以下步骤:
将所述原始子查询集作为当前子查询集;
根据所述当前子查询集中的查询,按照预设合并规则生成候选查询集;
对所述候选查询集中的查询按照预设排序规则进行排序;并,
根据排序结果,从所述排序结果中选择预设数量的查询,组成新的候选查询集;
在确定不满足预设迭代条件的情况下,将新的候选查询集作为新的当前子查询集,并返回执行所述按照预设组合规则,根据所述当前子查询集中的查询,生成候选查询集的步骤。
根据本公开的一个或多个实施例,示例4提供了示例3的方法,所述候选查询集中的查询包括过滤条件字段、维度列字段、指标列字段和查询次数字段,所述根据所述当前子查询集中的查询,按照预设合并规则生成候选查询集,包括:
针对所述当前子查询集中的每两个查询,根据该两个查询的过滤条件字段、维度列字段、指标列字段和查询次数字段,对该两个查询进行合并;
根据合并结果和所述当前子查询集中包括的查询,生成候选查询集。
根据本公开的一个或多个实施例,示例5提供了示例1的方法,所述从所述候选查询集中确定目标查询,包括:
计算所述候选查询集中每个查询的指标参数;
根据所述候选查询集中每个查询的指标参数,选取所述指标参数满足与所述指标参数对应的预设参数条件的查询作为目标查询。
根据本公开的一个或多个实施例,示例6提供了示例1-5中任一一项的方法,所述创建与所述目标查询对应的物化视图,包括:
设置与所述目标查询对应的物化视图的数据时段,其中,所述物化视图包括在与所述目标查询的表字段对应的基表中的与所述数据时段对应的数据;
根据所述目标查询和所述数据时段,生成数据定义语言;
执行所述数据定义语言,以创建与所述目标查询对应的物化视图。
根据本公开的一个或多个实施例,示例7提供了示例1-5中任一一项的方法,所述方法还包括:
检测所述物化视图是否属于失效视图;
在检测到所述物化视图属于所述失效视图的情况下,删除所述物化视图。
根据本公开的一个或多个实施例,示例8提供了一种物化视图的创建装置,包括:
提取模块,用于提取历史查询语句中的原始子查询;
生成模块,用于根据所述原始子查询,按照预设规则生成候选查询集;
确定模块,用于从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
创建模块,用于创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
根据本公开的一个或多个实施例,示例9提供了计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现示例1-7中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例10提供了一种电子设备,其特征在于,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1-7中任一项所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (10)

  1. 一种物化视图的创建方法,其特征在于,包括:
    提取历史查询语句中的原始子查询;
    根据所述原始子查询,按照预设规则生成候选查询集;
    从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
    创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
  2. 根据权利要求1所述的创建方法,其特征在于,所述原始子查询包括表字段,所述根据所述原始子查询,按照预设规则生成候选查询集,包括:
    根据所述原始子查询的表字段,将具有相同表字段的原始子查询划分为同一组,得到多个原始子查询集;
    针对每个所述原始子查询集,根据该原始子查询集中的原始子查询,按照预设规则生成候选查询集。
  3. 根据权利要求2所述的创建方法,其特征在于,所述针对每个所述原始子查询集,根据该原始子查询集中的原始子查询,按照预设规则生成候选查询集,包括:
    针对每个所述原始子查询集,执行以下步骤:
    将所述原始子查询集作为当前子查询集;
    根据所述当前子查询集中的查询,按照预设合并规则生成候选查询集;
    对所述候选查询集中的查询按照预设排序规则进行排序;并,
    根据排序结果,从所述排序结果中选择预设数量的查询,组成新的候选查询集;
    在确定不满足预设迭代条件的情况下,将新的候选查询集作为新的当前子查询集,并返回执行所述按照预设组合规则,根据所述当前子查询集中的查询,生成候选查询集的步骤。
  4. 根据权利要求3所述的创建方法,其特征在于,所述候选查询集中的查询包括过滤条件字段、维度列字段、指标列字段和查询次数字段,所述根据所述当前子查询集中的查询,按照预设合并规则生成候选查询集,包括:
    针对所述当前子查询集中的每两个查询,根据该两个查询的过滤条件字段、维度列字段、指标列字段和查询次数字段,对该两个查询进行合并;
    根据合并结果和所述当前子查询集中包括的查询,生成候选查询集。
  5. 根据权利要求1所述的创建方法,其特征在于,所述从所述候选查询集中确定目标查询,包括:
    计算所述候选查询集中每个查询的指标参数;
    根据所述候选查询集中每个查询的指标参数,选取所述指标参数满足与所述指标参数对应的预设参数条件的查询作为目标查询。
  6. 根据权利要求1-5中任一所述的创建方法,其特征在于,所述创建与所述目标查询对应的物化视图,包括:
    设置与所述目标查询对应的物化视图的数据时段,其中,所述物化视图包括在与所述目标查询的表字段对应的基表中的与所述数据时段对应的数据;
    根据所述目标查询和所述数据时段,生成数据定义语言;
    执行所述数据定义语言,以创建与所述目标查询对应的物化视图。
  7. 根据权利要求1-5中任一所述的创建方法,其特征在于,所述方法还包括:
    检测所述物化视图是否属于失效视图;
    在检测到所述物化视图属于所述失效视图的情况下,删除所述物化视图。
  8. 一种物化视图的创建装置,其特征在于,包括:
    提取模块,用于提取历史查询语句中的原始子查询;
    生成模块,用于根据所述原始子查询,按照预设规则生成候选查询集;
    确定模块,用于从所述候选查询集中确定目标查询,其中,所述目标查询是查询收益大于预设收益阈值的查询;
    创建模块,用于创建与所述目标查询对应的物化视图,其中,所述物化视图用于处理相应的查询。
  9. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
  10. 一种电子设备,其特征在于,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-7中任一项所述方法的步骤。
PCT/CN2022/109471 2021-09-03 2022-08-01 物化视图的创建方法、装置、存储介质及电子设备 WO2023029855A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111032832.8A CN113986933A (zh) 2021-09-03 2021-09-03 物化视图的创建方法、装置、存储介质及电子设备
CN202111032832.8 2021-09-03

Publications (1)

Publication Number Publication Date
WO2023029855A1 true WO2023029855A1 (zh) 2023-03-09

Family

ID=79735333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/109471 WO2023029855A1 (zh) 2021-09-03 2022-08-01 物化视图的创建方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN113986933A (zh)
WO (1) WO2023029855A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829600A (zh) * 2023-12-29 2024-04-05 浙江大学 基于交互可视分析的风险企业规则模型构建与查询方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986933A (zh) * 2021-09-03 2022-01-28 北京火山引擎科技有限公司 物化视图的创建方法、装置、存储介质及电子设备
CN114218263B (zh) * 2022-02-23 2022-05-13 浙江一山智慧医疗研究有限公司 物化视图的自动创建方法及基于物化视图的快速查询方法
CN114547086B (zh) * 2022-04-22 2022-07-22 网易(杭州)网络有限公司 数据处理方法、装置、设备及计算机可读存储介质
CN115757464B (zh) * 2022-11-18 2023-07-25 中国科学院软件研究所 一种基于深度强化学习的智能物化视图查询方法
CN116108076B (zh) * 2023-04-10 2023-07-18 之江实验室 物化视图查询方法、系统、设备及存储介质
CN116541377B (zh) * 2023-04-27 2024-05-14 阿里巴巴(中国)有限公司 任务的物化视图的处理方法、系统和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080147598A1 (en) * 2006-12-14 2008-06-19 Ianywhere Solutions, Inc. Query optimization using materialized views in database management systems
US20090319476A1 (en) * 2008-06-20 2009-12-24 Chris Olston Adaptive materialized view selection for databases
CN111597209A (zh) * 2020-04-30 2020-08-28 清华大学 一种数据库物化视图构建系统、方法以及系统创建方法
CN112286953A (zh) * 2020-09-25 2021-01-29 北京邮电大学 多维数据查询方法、装置和电子设备
CN112740199A (zh) * 2018-09-21 2021-04-30 华为技术有限公司 用于数据库查询优化的物化视图
CN113986933A (zh) * 2021-09-03 2022-01-28 北京火山引擎科技有限公司 物化视图的创建方法、装置、存储介质及电子设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250457B (zh) * 2016-07-28 2019-07-12 国网山东省电力公司电力科学研究院 大数据平台物化视图的查询处理方法及系统
CN112162983A (zh) * 2020-09-22 2021-01-01 北京人大金仓信息技术股份有限公司 数据库索引建议处理方法、装置、介质和电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080147598A1 (en) * 2006-12-14 2008-06-19 Ianywhere Solutions, Inc. Query optimization using materialized views in database management systems
US20090319476A1 (en) * 2008-06-20 2009-12-24 Chris Olston Adaptive materialized view selection for databases
CN112740199A (zh) * 2018-09-21 2021-04-30 华为技术有限公司 用于数据库查询优化的物化视图
CN111597209A (zh) * 2020-04-30 2020-08-28 清华大学 一种数据库物化视图构建系统、方法以及系统创建方法
CN112286953A (zh) * 2020-09-25 2021-01-29 北京邮电大学 多维数据查询方法、装置和电子设备
CN113986933A (zh) * 2021-09-03 2022-01-28 北京火山引擎科技有限公司 物化视图的创建方法、装置、存储介质及电子设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829600A (zh) * 2023-12-29 2024-04-05 浙江大学 基于交互可视分析的风险企业规则模型构建与查询方法

Also Published As

Publication number Publication date
CN113986933A (zh) 2022-01-28

Similar Documents

Publication Publication Date Title
WO2023029855A1 (zh) 物化视图的创建方法、装置、存储介质及电子设备
KR102627690B1 (ko) Sql 질의 플랜들을 최적화하기 위한 차원 콘텍스트 전파 기술들
US10585913B2 (en) Apparatus and method for distributed query processing utilizing dynamically generated in-memory term maps
US10942926B2 (en) Identification, and query, of semantically-related database tables
US9575966B2 (en) Optimizing relational database queries with multi-table predicate expressions
US20230177078A1 (en) Conversational Database Analysis
US20180165346A1 (en) Multi-dimensional analysis using named filters
US10157234B1 (en) Systems and methods for transforming datasets
US11074267B2 (en) Staged approach to automatic data discovery and performance
US10579500B2 (en) Debugging for SQL statement
US20160070754A1 (en) System and method for microblogs data management
WO2024021790A1 (zh) 一种基于数据湖的虚拟列构建方法以及数据查询方法
CN108733688B (zh) 数据分析的方法、装置
US10776368B1 (en) Deriving cardinality values from approximate quantile summaries
US20210209122A1 (en) Information push method and apparatus, device, and storage medium
CN116955413A (zh) 基于线上分析处理的数据查询方法、装置、介质及设备
CN111159221A (zh) 一种通过动态构建立方体进行数据处理或查询的方法
WO2022068348A1 (zh) 关系图谱构建方法、装置、电子设备及存储介质
CN114969044A (zh) 一种基于数据湖的物化列创建方法以及数据查询方法
CN117891979B (zh) 血缘图谱构建方法、装置、电子设备和可读介质
Cheng et al. Generic cumulative annular bucket histogram for spatial selectivity estimation of spatial database management system
US20230315715A1 (en) Utilizing a structured audit log for improving accuracy and efficiency of database auditing
CN110674401B (zh) 确定搜索项目的顺序的方法、装置和电子设备
US11880366B2 (en) Index-based, adaptive join size estimation
US20240135196A1 (en) Method and apparatus for knowledge graph construction, storage medium, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862996

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE