CN114218263B - Materialized view automatic creation method and materialized view based quick query method - Google Patents

Materialized view automatic creation method and materialized view based quick query method Download PDF

Info

Publication number
CN114218263B
CN114218263B CN202210164998.3A CN202210164998A CN114218263B CN 114218263 B CN114218263 B CN 114218263B CN 202210164998 A CN202210164998 A CN 202210164998A CN 114218263 B CN114218263 B CN 114218263B
Authority
CN
China
Prior art keywords
materialized
materialized view
frequent item
query
item set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210164998.3A
Other languages
Chinese (zh)
Other versions
CN114218263A (en
Inventor
高春蓉
李俊
许振影
俞青
应晶
傅亦婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Yishan Intelligent Medical Research Co ltd
Original Assignee
Zhejiang Yishan Intelligent Medical Research Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Yishan Intelligent Medical Research Co ltd filed Critical Zhejiang Yishan Intelligent Medical Research Co ltd
Priority to CN202210164998.3A priority Critical patent/CN114218263B/en
Publication of CN114218263A publication Critical patent/CN114218263A/en
Application granted granted Critical
Publication of CN114218263B publication Critical patent/CN114218263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2428Query predicate definition using graphical user interfaces, including menus and forms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Abstract

The application provides an automatic creation method of a materialized view, which comprises the following steps: acquiring a historical query record; extracting at least one frequent item set from all historical query sentences; all frequent item sets related to the same basic data table are ordered from strong to weak according to relevance to obtain a frequent item set sequence set; sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding earnings and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the earnings; and sequentially creating corresponding materialized views for the second time according to the frequent item set sequence sets until the calculated yield is less than or equal to the screening condition after any materialized view is created. According to the method, a plurality of data elements are extracted through analyzing historical query records to conduct frequent item set mining, corresponding materialized views are automatically created according to frequent item sets, the number of the materialized views is controlled through the yield of the materialized views, and storage space is reasonably planned and utilized.

Description

Materialized view automatic creation method and materialized view based quick query method
Technical Field
The application relates to the technical field of databases, in particular to an automatic creation method of materialized views and a quick query method based on materialized views.
Background
A database refers to an organized, sharable collection of data stored long term in a computer. The data in the database is organized, described and stored according to a certain data model, and the data management system has low redundancy, high data independence and easy expansibility.
However, with the advent of the big data age, the amount of data in the database is more and more huge, and therefore the response time of the query is longer and longer. The materialized view is an effective query means, is a physical structure for improving data access time by pre-calculating and storing intermediate results, and can greatly shorten the query response time. That is, some large and time-consuming computations and query results are implemented with materialized views, which improves the efficiency of the query.
At present, materialized views are mostly created through manual selection, and later-period updating and maintenance of the materialized views are less manual participation. For a database with huge data volume in the big data era, the method for creating, updating and maintaining the materialized view by manual selection is extremely low in efficiency. Moreover, the materialized view needs to occupy a certain storage space, and if too many materialized views are created, the storage occupation is too large and the data insertion speed is slow, so that the response speed in query is influenced.
Disclosure of Invention
According to the method, a plurality of data elements are extracted through analyzing historical query statements in historical query records to conduct frequent item set mining, corresponding materialized views are automatically created according to frequent item sets, the number of finally created materialized views is controlled according to benefits brought by the creation of the materialized views, storage space is reasonably planned and utilized, and optimization of storage space utilization and maximization of query speed improvement are achieved. The method comprises the following steps:
obtaining historical query records, wherein the historical query records comprise at least one historical query statement, and any historical query statement relates to at least one field of at least one basic data table;
mining at least one frequent item set from all the historical query sentences, wherein the frequent item set is a set of fields or field combinations which have strong relevance with at least one basic data table;
all frequent item sets related to the same basic data table are ordered from strong to weak according to relevance to obtain a frequent item set sequence set;
sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding earnings and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the earnings;
and sequentially creating corresponding materialized views for the second time according to the frequent item set sequence set until the calculated yield is less than or equal to the screening condition after any one materialized view is created.
In a second aspect, the embodiment of the application provides a fast query method based on materialized views, the method uses the automatic creation method of materialized views in the first aspect to create historical materialized views in advance, and the query speed is increased through the historical materialized views. The method comprises the following steps:
obtaining at least one historically materialized view according to the automatic creation method of the materialized view of any one of the first aspect;
acquiring a query statement;
extracting at least one data table and at least one field or at least one field combination which are involved from the query statement as matching conditions;
matching an optimal materialized view from all the historical materialized views according to the matching condition;
and obtaining a query result from the optimal materialized view according to the query statement.
In a third aspect, an embodiment of the present application provides an automatic creation apparatus for a materialized view, which is used to implement the automatic creation method for a materialized view described in the first aspect, and the apparatus includes the following modules:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring historical query records, and the historical query records comprise at least one historical query statement, and any historical query statement relates to at least one field of at least one basic data table;
a first parsing module for mining at least one frequent item set from all the historical query statements, wherein the frequent item set is a set of fields or field combinations having strong relevance to at least one of the base data tables;
the sorting module is used for sorting all frequent item sets related to the same basic data table from strong to weak according to the relevance to obtain a frequent item set sequence set;
the first creating module is used for sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding earnings and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the earnings;
and the second creation module is used for sequentially creating corresponding materialized views according to the frequent item set sequence set for the second time until the calculated yield is less than or equal to the screening condition after any one of the materialized views is created.
In a fourth aspect, an embodiment of the present application provides a fast query apparatus based on a materialized view, which is used to implement the fast query method based on a materialized view in the second aspect, and the apparatus includes the following modules:
a creating module, configured to obtain at least one historically materialized view according to the method for automatically creating a materialized view according to any one of the first aspects;
the second acquisition module is used for acquiring the query statement;
the second analysis module is used for extracting at least one data table and at least one field or at least one field combination which are involved from the query statement as matching conditions;
the matching module is used for matching an optimal materialized view from all the historical materialized views according to the matching condition;
and the query module is used for acquiring a query result from the optimal materialized view according to the query statement.
In a fifth aspect, an embodiment of the present application provides an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for automatically creating a materialized view or the method for fast querying based on the materialized view according to any of the embodiments of the present application.
In a sixth aspect, embodiments of the present application provide a readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process, the process comprising an automatic creation method of a materialized view or a fast query method based on a materialized view according to any of the embodiments of the present application.
The main contributions and innovation points of the present application are as follows:
1. in some application embodiments of the present application, an automatic creation method of materialized views is provided, where each historical query statement in a historical query record is analyzed, a data element, such as a data table, a field, or a field combination, in the historical query statement is extracted, a frequent item set is mined from the historical query statement, a corresponding materialized view is automatically created, a yield rate brought by a newly created materialized view is dynamically analyzed, a screening condition is determined according to the yield rate to control the number of the finally created materialized views, and a dynamic balance effect is achieved, so that query efficiency is significantly improved. In some application embodiments, the storage space required for creating the materialized view is regarded as overhead, the query speed change is regarded as the benefit brought by the materialized view, and the benefit rate brought by the materialized view is calculated according to the ratio of the storage space change to the query speed change.
2. According to the method, historical materialized views are obtained at one time or multiple times according to an automatic creating method of the materialized views, after new query statements are obtained, data tables and fields or field combinations are extracted from the query statements and serve as matching conditions, optimal materialized views are matched from all historical materialized views according to the matching conditions, and finally query results can be quickly obtained from the optimal materialized views according to the query statements. And in some application embodiments, a matching method for the optimal materialized view is defined. The method and the device realize automatic creation of the historical materialized view according to the historical query records, and simultaneously automatically match the optimal materialized view for the new query statement so as to improve the efficiency of data query.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for automatic creation of materialized views according to a first embodiment of the present application;
FIG. 2 is a flow chart of a fast query method based on materialized view according to the second embodiment of the present application;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Example one
In the embodiment, as shown in fig. 1, the method includes steps S1-S5:
step S1: obtaining historical query records, wherein the historical query records comprise at least one historical query statement, and any historical query statement relates to at least one field of at least one basic data table;
step S2: mining at least one frequent item set from all the historical query sentences, wherein the frequent item set is a set of fields or field combinations which have strong relevance with at least one basic data table;
step S3: all frequent item sets related to the same basic data table are ordered from strong to weak according to relevance to obtain a frequent item set sequence set;
step S4: sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding earnings and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the earnings;
step S5: and sequentially creating corresponding materialized views for the second time according to the frequent item set sequence set until the calculated yield is less than or equal to the screening condition after any one materialized view is created.
According to the method, each historical query statement in the historical query record is analyzed, data elements, such as data tables, fields or field combinations, in the historical query record are extracted, frequent item sets are mined out from the historical query record, corresponding materialized views are automatically created, storage space changes and query speed changes brought by the newly created materialized views are dynamically analyzed, the number of the created materialized views is controlled, the effect of dynamic balance is achieved, and the query efficiency is remarkably improved.
In step S1, a historical query record is first obtained, where the historical query record includes at least one historical query statement. Each historical query statement relates to at least one base data table in at least one database and to data of at least one field in the base data table.
In the present embodiment, the term "related to" is not limited to an inclusive meaning. For example, the historical query statement includes the basic data table B of the database a, but the purpose is to obtain data of all fields in the basic data table B, and "select from B" is usually used, that is, the historical query statement does not include any fields in the basic data table B, but refers to data of fields in the basic data table B.
Specifically, the historical query records of the database can be obtained by querying a session table of the database or analyzing a log of the database.
In step S2, at least one frequent item set, which is a set of fields or field combinations having strong association with at least one underlying data table, is extracted from all the historical query statements.
Wherein the field is an included field or a filtered field; the field combinations are included field combinations or filtered field combinations. The included fields are fields corresponding to the data desired to be queried in the historical query statement, and the filter fields are fields used to screen out a portion of the data in the historical query. Similarly, in the historical query statement, one filtering field may be used to screen out a certain part of data, or a plurality of filtering fields may be used to screen out a certain part of data.
In addition, the term "strong relevance" in the present embodiment means ranking the relevance in order from strong to weak, and then ranking the relevance within a set value. For example, the set of all analyzed fields or field combinations related to at least one basic data table is sorted according to the relevance, and the set value is 5, then the set of relevance rank 5 is the frequent item set in this embodiment.
Specifically, a frequent item set mining algorithm may be used to extract at least one frequent item set from all historical query statements. For example, support is defined as the frequency with which disjoint subsets A, B occur simultaneously in dataset D, confidence is defined as the frequency with which B also occurs when A occurs, and terms with support greater than a minimum support threshold and confidence greater than a minimum confidence threshold are integrated into a frequent set of terms.
In step S3, all frequent item sets related to the same basic data table are sorted from strong to weak according to relevance to obtain a frequent item set order set. The strength of the relevance is determined by the size of the support degree, and the relevance of the frequent item set with the higher support degree is stronger.
In step S4, the corresponding materialized views are sequentially created for the first time according to the frequent itemset order set, and after each materialized view is created, the yield rate brought by the creation of the materialized view is calculated, and a screening condition is determined according to the yield rate brought by each materialized view, which is mainly used for controlling the number of the finally created materialized views.
Specifically, the storage space required for creating the materialized view can be regarded as overhead, and the query speed change can be regarded as the benefit brought by the materialized view, so that the benefit rate brought by the materialized view is the ratio of the storage space change to the query speed change.
In the embodiment, a screening condition is determined according to the yield rate brought by each materialized view when the materialized views are created for the first time, the number of the finally created materialized views is controlled, and the remaining storage space is reasonably planned and utilized, so that the utilization effect of the storage space is maximized, and the query efficiency is optimized.
In some embodiments, the space usage change is a ratio of a difference between a duty cycle before and after any one of the materialized views is established to a storage space after the materialized view is established; the query speed change is the ratio of the difference between the query speeds before and after the establishment of any materialized view to the query speed after the establishment of the materialized view.
For example, the storage space after a certain materialized view is established is A, the difference between the storage space before the materialized view is Δ A, and Δ A/A is the change of the space occupation for establishing the materialized view; the query speed after the materialized view is established is B, the difference between the query speed before the materialized view is Δ B, and then the Δ B/B is the query speed change for establishing the materialized view. Therefore, the ratio of the change in space occupancy to the change in query speed is (Δ A/A)/(. Δ B/B).
In some embodiments, the screening condition is the median of all the ratios. For example, 10 frequent item sets are obtained through step S2, and then establishing a corresponding materialized view according to each frequent item set would obtain a corresponding yield, so a median of 10 yields is selected as a filtering condition for creating the materialized view this time.
In addition, in some embodiments, methods are also provided to avoid creating unnecessary materialized views.
For example: before the step of sequentially creating the corresponding materialized views according to the frequent itemset sequence set for the first time, the method comprises the following steps: and judging whether the sum of the data volumes of all the basic data tables related to each frequent item set reaches a preset data volume, and if not, deleting the frequent item set from the frequent item set sequence set.
Or: when the corresponding materialized views are sequentially created according to the frequent item set sequence set for the first time, the method comprises the following steps: before any materialized view is created, whether the residual space meets the requirement of creating the materialized view is judged, and if not, the frequent item set is deleted from the frequent item set sequence set.
Or: when the corresponding materialized views are sequentially created according to the frequent item set sequence set for the first time, the method comprises the following steps: before any materialized view is created, whether the materialized view exists or not is judged, and if the materialized view exists, the frequent item set is deleted from the frequent item set sequence set.
In step S5, creating the corresponding materialized views in sequence according to the frequent itemset order sets for the second time, and stopping until the calculated profitability after creating any one of the materialized views is less than or equal to the screening condition.
Therefore, the problems that storage space is excessively occupied due to the fact that too many materialized views are created at one time, the materialized views are slowly updated and maintained at a later stage can be avoided, the rest storage space can be used for storing other materialized views created manually or periodically, a certain storage space is reserved for improving the query efficiency each time, and the query efficiency is improved and optimized by optimizing the utilization of the storage space.
Example two
According to an automatic creation method of a materialized view in the first embodiment, the present embodiment provides a fast query method based on a materialized view, and the method, as shown in fig. 2, includes the following steps:
obtaining at least one historically materialized view according to the automatic creation method of the materialized view described in the first embodiment;
acquiring a query statement;
extracting at least one data table and at least one field or at least one field combination which are involved from the query statement as matching conditions;
matching an optimal materialized view from all the historical materialized views according to the matching condition;
and obtaining a query result from the optimal materialized view according to the query statement.
The method includes the steps that the historical materialized view is obtained once or for multiple times by the aid of the automatic creation method of the materialized view in the first embodiment, after a new query statement is obtained, data tables and fields or field combinations are extracted from the query statement and serve as matching conditions, the optimal materialized view is matched from all the historical materialized views according to the matching conditions, and finally query results can be quickly obtained from the optimal materialized view according to the query statement.
In some specific embodiments, some matching rules are set to obtain the optimal materialized view, that is, the optimal materialized view needs to satisfy a certain condition.
For example: the optimal materialized view is the materialized view which contains the least number of fields in the materialized view containing all the fields or all the data corresponding to the field combination; or when the fields in the query statement comprise the filter fields, the optimal materialized view is a materialized view which comprises all the fields or data corresponding to all the field combinations and is sorted by adopting any filter field; or the optimal materialized view simultaneously satisfies the two conditions.
EXAMPLE III
Based on the same concept, this embodiment further provides an automatic creation apparatus of a materialized view, which is used to implement the automatic creation method of the materialized view described in the first embodiment, and the apparatus includes:
a first obtaining module, configured to obtain a historical query record, where the historical query record includes at least one historical query statement, and any historical query statement relates to at least one field of at least one basic data table;
a first parsing module for mining at least one frequent item set from all the historical query statements, wherein the frequent item set is a set of fields or field combinations having strong relevance to at least one of the base data tables;
the sorting module is used for sorting all frequent item sets related to the same basic data table from strong to weak according to the relevance to obtain a frequent item set sequence set;
the first creating module is used for sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding earnings and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the earnings;
and the second creation module is used for sequentially creating corresponding materialized views according to the frequent item set sequence set for the second time until the calculated yield is less than or equal to the screening condition after any one of the materialized views is created.
Based on the same concept, this embodiment further provides a fast query apparatus based on a materialized view, which is used to implement the fast query method based on a materialized view described in embodiment two, and the apparatus includes:
the creation module is used for obtaining at least one historically materialized view according to the automatic creation method of the materialized view in the first embodiment;
the second acquisition module is used for acquiring the query statement;
the second analysis module is used for extracting at least one data table and at least one field or at least one field combination which are involved from the query statement as matching conditions;
the matching module is used for matching an optimal materialized view from all the historical materialized views according to the matching conditions;
and the query module is used for acquiring a query result from the optimal materialized view according to the query statement.
Example four
The embodiment further provides an electronic apparatus, referring to fig. 3, including a memory 404 and a processor 402, where the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps of the method for automatically creating a materialized view or the method for fast querying based on a materialized view in any one of the above embodiments.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical disc, a magneto-optical disc, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 404 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory 404 (FPMDRAM), an Extended data output Dynamic Random-Access Memory (eddram), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
Memory 404 may be used to store or cache various data files needed for processing and/or communication purposes, as well as possibly computer program instructions executed by processor 402.
The processor 402, by reading and executing computer program instructions stored in the memory 404, implements any of the above embodiments of the method for automated creation of materialized views or the method for fast querying based on materialized views.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input and output devices 408 are used to input or output information. In the present embodiment, the input information may be a current data table such as a caption document, feature data, a template table, etc., and the output information may be a feature fingerprint, a fingerprint template, text classification recommendation information, a file template configuration mapping table, a file template configuration information table, etc.
Optionally, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:
obtaining historical query records, wherein the historical query records comprise at least one historical query statement, and any historical query statement relates to at least one field of at least one basic data table;
mining at least one frequent item set from all the historical query sentences, wherein the frequent item set is a set of fields or field combinations which have strong relevance with at least one basic data table;
all frequent item sets related to the same basic data table are ordered from strong to weak according to relevance to obtain a frequent item set sequence set;
sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding earnings and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the earnings;
and sequentially creating corresponding materialized views for the second time according to the frequent item set sequence set until the calculated yield is less than or equal to the screening condition after any one materialized view is created.
Or:
obtaining at least one historically materialized view according to the automatic creation method of the materialized view;
acquiring a query statement;
extracting at least one data table and at least one field or at least one field combination which are involved from the query statement as matching conditions;
matching an optimal materialized view from all the historical materialized views according to the matching condition;
and obtaining a query result from the optimal materialized view according to the query statement.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with any one of the above embodiments of the method for automatically creating a materialized view or the method for quickly querying based on a materialized view, the embodiments of the present application may be implemented by a computer program product. The computer program product comprises software code portions for performing a method for implementing the automatic creation of a materialized view or a fast query method based on materialized views as described in any of the above embodiments when the computer program product is run on a computer.
In addition, in combination with any one of the above embodiments of the method for automatically creating a materialized view or the method for quickly querying based on a materialized view, embodiments of the present application may provide a readable storage medium to implement the method. The readable storage medium having stored thereon a computer program; when executed by a processor, the computer program realizes the automatic creation method of any materialized view or the quick query method based on the materialized view in the embodiment.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also called program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (9)

1. A method for automatically creating materialized views is characterized by comprising the following steps:
obtaining historical query records, wherein the historical query records comprise at least one historical query statement, and any historical query statement relates to at least one field of at least one basic data table;
mining at least one frequent item set from all the historical query statements, wherein the frequent item set is a set of fields or field combinations having strong relevance to at least one of the base data tables, defining a support degree as a frequency of occurrence of a disjoint subset A, B in a data set D at the same time, defining a confidence degree as a frequency of occurrence of B at the same time when A occurs, and integrating items with a support degree greater than a minimum support degree threshold and a confidence degree greater than a minimum confidence degree threshold as the frequent item set;
all frequent item sets related to the same basic data table are ordered from strong to weak according to relevance to obtain a frequent item set sequence set, the strength of the relevance is determined by the magnitude of support, and the relevance of the frequent item sets with higher support is stronger;
sequentially creating corresponding materialized views according to the frequent item set sequence set for the first time, calculating corresponding profitability and deleting the materialized views after each materialized view is created, and obtaining screening conditions according to all the profitability, wherein the profitability is the ratio of space occupation change and query speed change after any one materialized view is created;
and sequentially creating corresponding materialized views for the second time according to the frequent item set sequence set until the calculated yield is less than or equal to the screening condition after any one materialized view is created.
2. The method of automatic creation of a materialized view according to claim 1, wherein the change in space occupation is a ratio of a difference between a storage space before and after creation of any one of the materialized views to a storage space after creation of the materialized view; the query speed change is the ratio of the difference between the query speeds before and after the establishment of any materialized view to the query speed after the establishment of the materialized view.
3. The method of automatic creation of materialized views according to claim 1 wherein the culling condition is the median of all the yield rates.
4. The method of automatic creation of materialized views according to claim 1 wherein the field is an included field or a filtered field; the field combinations are included field combinations or filtered field combinations.
5. The method of automatic creation of materialized views according to claim 1 wherein the data in each of said materialized views is kept consistent with the data in the corresponding base data table.
6. The method for automatically creating a materialized view according to claim 1, wherein before the step of sequentially creating the corresponding materialized view according to the frequent itemset order set for the first time, the method comprises the following steps: and judging whether the sum of the data volumes of all the basic data tables related to each frequent item set reaches a preset data volume, and if not, deleting the frequent item set from the frequent item set sequence set.
7. The method for automatically creating the materialized view according to claim 1, wherein when the corresponding materialized view is created for the first time according to the frequent item set sequence set in sequence, the method comprises the following steps: before any materialized view is created, whether the materialized view exists or not is judged, and if the materialized view exists, the frequent item set is deleted from the frequent item set sequence set.
8. The method for automatically creating the materialized view according to claim 1, wherein when the corresponding materialized view is created according to the frequent itemset order set for the first time, the method comprises the following steps: before any materialized view is created, whether the residual space meets the requirement of creating the materialized view is judged, and if not, the frequent item set is deleted from the frequent item set sequence set.
9. A fast query method based on materialized views is characterized by comprising the following steps:
the method for automatic creation of materialized views according to any one of claims 1 to 8 obtaining at least one historically materialized view;
acquiring a query statement;
extracting at least one data table and at least one field or at least one field combination which are involved from the query statement as matching conditions;
matching an optimal materialized view from all the historical materialized views according to the matching condition, wherein the optimal materialized view is a materialized view which comprises the least number of fields in the materialized view which comprises all the fields or the data corresponding to all the field combinations; or when the fields in the query statement comprise the filter fields, the optimal materialized view is a materialized view which comprises all the fields or data corresponding to all the field combinations and is sorted by adopting any filter field; or the optimal materialized view simultaneously meets the two conditions;
and obtaining a query result from the optimal materialized view according to the query statement.
CN202210164998.3A 2022-02-23 2022-02-23 Materialized view automatic creation method and materialized view based quick query method Active CN114218263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210164998.3A CN114218263B (en) 2022-02-23 2022-02-23 Materialized view automatic creation method and materialized view based quick query method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210164998.3A CN114218263B (en) 2022-02-23 2022-02-23 Materialized view automatic creation method and materialized view based quick query method

Publications (2)

Publication Number Publication Date
CN114218263A CN114218263A (en) 2022-03-22
CN114218263B true CN114218263B (en) 2022-05-13

Family

ID=80709360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210164998.3A Active CN114218263B (en) 2022-02-23 2022-02-23 Materialized view automatic creation method and materialized view based quick query method

Country Status (1)

Country Link
CN (1) CN114218263B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108076B (en) * 2023-04-10 2023-07-18 之江实验室 Materialized view query method, materialized view query system, materialized view query equipment and storage medium
CN116541377A (en) * 2023-04-27 2023-08-04 阿里巴巴(中国)有限公司 Processing method and system of materialized view of task and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250457A (en) * 2016-07-28 2016-12-21 国网山东省电力公司电力科学研究院 The inquiry processing method of big data platform Materialized View and system
JP2019105934A (en) * 2017-12-11 2019-06-27 Kddi株式会社 Inquiry sentence execution device, inquiry sentence execution method, and inquiry sentence execution program
CN112286953A (en) * 2020-09-25 2021-01-29 北京邮电大学 Multidimensional data query method and device and electronic equipment
US10963456B2 (en) * 2019-04-19 2021-03-30 Timescale, Inc. Querying of materialized views for time-series database analytics
CN113111097A (en) * 2021-05-12 2021-07-13 国家海洋信息中心 Method for realizing high-speed query of ocean data by using distributed database technology
CN113986933A (en) * 2021-09-03 2022-01-28 北京火山引擎科技有限公司 Materialized view creating method and device, storage medium and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120066248A1 (en) * 2010-09-13 2012-03-15 Stimson Stephen R Dynamic Creation of Materialized Database Views
US9785684B2 (en) * 2014-06-05 2017-10-10 International Business Machines Corporation Determining temporal categories for a domain of content for natural language processing
CN105159952A (en) * 2015-08-14 2015-12-16 安徽新华博信息技术股份有限公司 Data processing method based on frequent item set mining
CN111597209B (en) * 2020-04-30 2023-11-14 清华大学 Database materialized view construction system, method and system creation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250457A (en) * 2016-07-28 2016-12-21 国网山东省电力公司电力科学研究院 The inquiry processing method of big data platform Materialized View and system
JP2019105934A (en) * 2017-12-11 2019-06-27 Kddi株式会社 Inquiry sentence execution device, inquiry sentence execution method, and inquiry sentence execution program
US10963456B2 (en) * 2019-04-19 2021-03-30 Timescale, Inc. Querying of materialized views for time-series database analytics
CN112286953A (en) * 2020-09-25 2021-01-29 北京邮电大学 Multidimensional data query method and device and electronic equipment
CN113111097A (en) * 2021-05-12 2021-07-13 国家海洋信息中心 Method for realizing high-speed query of ocean data by using distributed database technology
CN113986933A (en) * 2021-09-03 2022-01-28 北京火山引擎科技有限公司 Materialized view creating method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于查询聚类的物化视图动态调整策略;冯霞等;《北京邮电大学学报》;20180831;第41卷(第4期);第16-22页 *

Also Published As

Publication number Publication date
CN114218263A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN114218263B (en) Materialized view automatic creation method and materialized view based quick query method
CN110321344B (en) Information query method and device for associated data, computer equipment and storage medium
US20200057958A1 (en) Identification and application of hyperparameters for machine learning
US11093461B2 (en) Method for computing distinct values in analytical databases
EP3217296A1 (en) Data query method and apparatus
US11074242B2 (en) Bulk data insertion in analytical databases
US20140236965A1 (en) Feature generation and model selection for generalized linear models
US20190034466A1 (en) Database archiving method and device for creating index information and method and device of retrieving archived database including index information
AU2016394743A1 (en) Method and apparatus for archiving database, and method and apparatus for searching archived database
US20180011923A1 (en) Value range synopsis in column-organized analytical databases
CN113849499A (en) Data query method and device, storage medium and electronic device
CN109033295B (en) Method and device for merging super-large data sets
CN111125158B (en) Data table processing method, device, medium and electronic equipment
CN106649385A (en) Data ranking method and device based on HBase database
US9235578B2 (en) Data partitioning apparatus and data partitioning method
CN116010670A (en) Data catalog recommendation method, device and application based on data blood relationship
CN116955341A (en) Database integrity evaluation method, system and application thereof
CN115794861A (en) Offline data query multiplexing method based on feature abstract and application thereof
CN113535962B (en) Data warehouse-in method, device, electronic device, program product and storage medium
CN106776704B (en) Statistical information collection method and device
CN114153925A (en) Data table association analysis method and device
CN113988670A (en) Comprehensive enterprise credit risk early warning method and system
CN112527824B (en) Paging query method, paging query device, electronic equipment and computer-readable storage medium
CN109739817B (en) Method and system for storing data file in big data storage system
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant