WO2018090557A1 - Method and device for querying data table - Google Patents
Method and device for querying data table Download PDFInfo
- Publication number
- WO2018090557A1 WO2018090557A1 PCT/CN2017/081321 CN2017081321W WO2018090557A1 WO 2018090557 A1 WO2018090557 A1 WO 2018090557A1 CN 2017081321 W CN2017081321 W CN 2017081321W WO 2018090557 A1 WO2018090557 A1 WO 2018090557A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- inequality
- trunc
- query condition
- data table
- equation
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24547—Optimisations to support specific applications; Extensibility of optimisers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Definitions
- the embodiments of the present application relate to the field of computers, and in particular, to a method and apparatus for querying a data table in the field of computers.
- the database management system is a layer of database management software between the user and the operating system. Database tuning can make the database application run faster. The goal is to make the database have higher throughput and more. Short response time.
- the database kernel it mainly implements the optimization technology of structured query language (SQL), including: query reuse technology, query rewrite rule technology, query algorithm optimization technology, parallel query optimization technology, distributed query. Optimization technology and other optimization techniques, but in the parallel computing database (MPPDB) of parallel computing, the optimizer module estimates the value and the cost of executing the job according to the type of the statement in the execution job when performing job optimization.
- SQL structured query language
- MPPDB parallel computing database
- the data distribution on each data node, etc. generates an optimal execution plan, but the optimal execution plan generated by the optimizer module does not necessarily achieve efficient optimization, for example, the data push service passengers in the airline ticketing system, For example, in the database of the aviation system, passengers A and B who boarded the aircraft at the same time within 60 minutes of time were found, but during the execution, it was found that there were hundreds of millions of passengers boarding records within the same time difference. Efficiency when querying different passengers and time differences in the database Low.
- the method for querying the data table provided by the embodiment of the present application can improve the efficiency of querying the data table.
- a method for querying a data table comprising: obtaining an initial query condition, the initial query condition including a first inequality; converting the first inequality to obtain a target query condition, The target query condition includes a first equation; and the data table is queried according to the target query condition.
- the first inequality is converted to obtain a target query condition, and the first inequality may be converted to obtain a first equation.
- the initial query condition is converted to obtain a target query condition, where the initial query condition includes a first inequality, and the target query condition includes a first equation, such that by converting the first inequality to the first class
- the first inequality is more inefficient, so the query efficiency is lower, but the query is transformed by the transformed equation, and the query scope is narrowed, thereby improving the efficiency of the query.
- the target query condition further includes a second inequality different from the first inequality.
- the target query condition further includes a second inequality, that is, the first inequality in the initial query condition may be converted into the first equation and the second inequality, and may be inquired according to the first equation and the second inequality.
- the data table in this way, can further narrow the scope of the query by the second inequality, and improve the query efficiency.
- the initial query condition further includes a second equation, wherein the according to the target
- the query condition query data table includes: querying the data table according to the first equation and the second equation.
- the query method in the embodiment of the present application may be a hash join or a self-join.
- the second query may be included in the initial query condition in the hashjoin query method.
- the second equation can be solved in the hashjoin algorithm processing. After filtering, each hash bucket corresponds to a huge amount of data, and the appropriate data is filtered by the first equation, the second inequality, and the second equation to reduce the amount of data in the hash bucket.
- C is a positive integer greater than 1
- n is an integer, that is, it can be a positive integer or a negative integer
- a and B are the values in the data table
- a and B are positive integers.
- m is an integer, that is, it can be a positive integer or a negative integer
- D and E are the values in the data table
- D and E are positive integers.
- the initial query condition is a self-joined selfjoin initial query condition.
- the query of selfjoin initial query condition means that a table can be connected with itself, which can simplify the complexity of the query and improve the efficiency of the query.
- the method before converting the first inequality to obtain a target query condition, further includes: determining whether the data amount included in the data table is greater than a first threshold; The first inequality is converted to obtain the target query condition, including: when the data amount included in the data table is greater than the first threshold, converting the first inequality to obtain the target query condition.
- an apparatus for querying a data table is provided for performing the method of the first aspect or any possible implementation of the first aspect.
- an apparatus for querying a data table comprising: a receiver, a transmitter, a memory, a processor, and a bus system.
- the receiver, the transmitter, the memory and the processor are connected by the bus system, the memory is for storing instructions for executing the instructions stored by the memory to control the receiver to receive signals and control the sending
- the transmitter transmits a signal, and when the processor executes the memory stored instructions, the execution causes the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
- an apparatus for querying a data table comprising: a memory and a processor.
- the memory is for storing computer executable instructions
- the processor is for reading the computer executable instructions and may perform the method of the first aspect or any possible implementation of the first aspect.
- a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
- FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application.
- FIG. 2 shows a schematic diagram of a method of querying a data table according to an embodiment of the present application.
- FIG. 3 shows a schematic diagram of another method of querying a data table according to an embodiment of the present application.
- FIG. 4 shows a schematic diagram of an apparatus for querying a data table according to an embodiment of the present application.
- FIG. 5 shows a schematic diagram of another apparatus for querying a data table according to an embodiment of the present application.
- RDBMS relational database management system
- NoSQL non-relational
- MPP-DB massively parallel processing database
- the embodiment of the present application is described by taking the SQL language as an example, but the embodiment of the present application may also adopt other languages, for example, an object-oriented query language (HQL), and the like. This is not limited.
- HTL object-oriented query language
- the database system generally includes four components: database, hardware, software, and personnel.
- a database refers to a collection of organized, shareable data that is stored on a storage medium internal or external to a computer. The data in the database is organized and described according to a certain mathematical model. Description and storage, with less redundancy, higher data independence and scalability, and can be shared by various users.
- Hardware refers to the various physical devices that make up a computer system, that is, the internal devices required for storage, as well as the external devices required for storage. The configuration of the hardware should meet the needs of the entire database system.
- the software includes the operating system, DBMS, and applications.
- DBMS is the core software of the database system. Scientifically organizes, stores data and efficiently acquires and maintains data with the support of the operating system. It can enable users to create, modify, optimize or query data in the database through different methods, for example.
- the DBMS can include an optimizer module for logical query optimization.
- Figure 1 shows the process of querying data.
- the query process for describing SQL statements includes the following steps:
- S102 Analyze the initial query condition of the SQL, and convert it into a query tree (Query Tree) through the lexical analysis, the syntax analysis, and the semantic check to pass to the next stage.
- the description of the embodiment of the present application is conveniently described by taking an SQL statement as an example. It should be understood that the embodiment of the present application may be described by using other initial query conditions.
- the form of the initial query condition may be a SQL-like statement, such as “select name from personal basics. Information where age > 30".
- the form of the initial query condition may also be a natural language, such as "inquiring the name of a person whose age is greater than 30 years old in the basic information of the individual".
- S106 Perform logic optimization query when the aggregation operation is not needed, for example, using a query technology to perform an equivalent exchange query.
- S106 may be before or after S107, and the embodiment of the present application does not limit this.
- S108 group, sort, aggregate, and de-optimize the plan obtained by the logic optimization query and the physical optimization query.
- the embodiment of the present application mainly describes the logic optimization query in the step S106.
- the amount of data stored and queried by the massive user data is very large, and the distributed database is usually used to segment the data to improve the query performance of the system. .
- the distributed database does not solve the problem of massive user data query.
- the number of users can reach 1 billion, and the attributes of users can reach 1 million.
- Such data can be stored in a traditional database, which can reach 10 billion columns and millions of rows.
- the amount of data queried is still very large.
- the embodiment of the present application can optimize the query condition of the massive data.
- the optimal query plan can be understood as the shortest query technology. The following describes the query optimization method of the embodiment of the present application.
- FIG. 2 illustrates a query optimization method 200 of an embodiment of the present application.
- the method 200 can be performed by an optimizer in a database system.
- the method 200 includes:
- the initial query condition is a self-join initial query condition
- the self-join initial query condition refers to a table that can be connected with itself, so that the complexity of the query can be simplified, and the query efficiency is improved.
- the first The second inequality is specifically A>B, where trunc( ⁇ ) is a truncated rounding operation, C is a positive integer greater than 1, and n is an integer, that is, it can be a positive integer or a negative integer, and A and B are The values in the data table, and A and B are positive integers.
- the second inequality Specifically, D>E, where trunc( ⁇ ) is a truncated rounding operation, F is a positive integer greater than 1, and m is an integer, that is, a positive integer or a negative integer, and D and E are the data tables.
- the second inequality Specifically, D>E, where trunc( ⁇ ) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, D and E are values in the data table, and D and E are positive integers
- the data table may be queried according to the first equation in the target query condition.
- the initial query condition includes a second equation
- the S230 includes: querying a data table according to the first equation and the second equation, or the S230 includes: The data table query is queried according to the first equation, the second equation, and the second inequality.
- the optimizer module generates an optimal execution plan according to the SQL statement, needs the underlying scan data, and then filters the scanned data through the hashjoin to filter the appropriate data, wherein the second equation needs to be included in the hashjoin query process.
- the first inequality is included in the initial query condition, the first inequality needs to be rewritten into the second inequality and the first inequal, the second inequality of the hashjoin, the second inequality after the rewriting, and the first equation are used.
- the query of the statement can solve the problem that the amount of data corresponding to each hash bucket is huge after filtering by the second equation in the hashjoin algorithm processing, and filtering the appropriate data by the first equation, the second inequality and the second equation, and reducing The amount of data in the hash bucket.
- the SQL target query conditions of passengers A and B who are boarding in the same place within 60 minutes are found as follows:
- FIG. 3 depicts a method 300 of querying a data table, the method 300 including:
- S303 if it is a selfjoin method, determine a query cost, and determine whether the number of estimated cost rows is large, for example, determining that the number of rows is greater than a certain preset threshold, when the number of rows is less than a preset threshold, or for example, as described above: When the number of values in the data table is greater than the first threshold, if the amount of data is considered to be large, then the process jumps to S307.
- S304 In S303, if the number of rows is greater than a preset threshold, determine whether the initial query condition is a convertible non-equivalent expression. If there is no convertible non-equivalent expression in the initial query condition, skip to S307, for example, the non-equivalent expression is the aforementioned first inequality.
- the target query condition may include an equivalent expression; or the target query condition may include another non-equivalent expression and an equivalent expression.
- the query efficiency is low, by converting the first inequality into the second inequality and the first
- the one-element condition increases the conditions for querying data, which further improves the efficiency of querying data.
- FIG. 4 shows an apparatus 300 for querying a data table provided by an embodiment of the present application.
- the apparatus 300 includes:
- the obtaining module 410 is configured to obtain an initial query condition, where the initial query condition includes a first inequality
- the conversion module 420 is configured to perform conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation;
- the query module 430 is configured to query the data table according to the target query condition.
- the initial query condition further includes a second equation
- the query module 430 is specifically configured to: query the data table according to the first equation and the second equation.
- the target query condition further includes a second unequal difference from the first inequality formula.
- the first The second inequality is specifically A>B, where trunc( ⁇ ) is a truncated rounding operation, C is a positive integer greater than 1, n is an integer, A and B are the values in the data table, and A and B are A positive integer.
- the second inequality Specifically, D>E, where trunc( ⁇ ) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, D and E are values in the data table, and D and E are positive integers .
- the second inequality is G ⁇ H, where trunc( ⁇ ) is a truncated rounding operation, I is a positive integer greater than 1, p is an integer, G and H are values in the data table, and G and H is a positive integer.
- the initial query condition is a self-joining selfjoin initial query condition.
- the conversion module 420 is further configured to: before converting the first inequality to obtain a target query condition, determining whether the data amount included in the data table is greater than a first threshold; When the data amount included in the data table is greater than the first threshold, the first inequality is converted to obtain the target query condition.
- the apparatus 400 herein is embodied in the form of a functional module.
- module as used herein may refer to an ASIC, an electronic circuit, a processor (eg, a shared processor, a proprietary processor or a group processor, etc.) and memory, a merge logic, and a processor for executing one or more software or firmware programs. / or other suitable components that support the described functionality.
- the device 400 may be specifically the optimizer in the foregoing embodiment, and the device 400 may be used to execute various processes and/or steps corresponding to the optimizer in the foregoing method embodiments. To avoid repetition, we will not repeat them here.
- FIG. 5 shows an apparatus 500 for querying a data table provided by an embodiment of the present application.
- the apparatus 500 includes a receiver 510, a processor 520, a transmitter 530, a memory 540, and a bus system 550.
- the receiver 510, the processor 520, the transmitter 530 and the memory 540 are connected by a bus system 550 for storing instructions for executing instructions stored in the memory 540 to control the receiver 510.
- a signal is received and the transmitter 530 is controlled to send an instruction.
- the receiver 510 and the transmitter 530 may be a transceiver interface, etc., which is not limited in this embodiment of the present application.
- the receiver 510 is configured to obtain an initial query condition, where the initial query condition includes a first inequality; the processor 520 performs conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation.
- the processor 520 is further configured to query the data table according to the target query condition.
- the initial query condition further includes a second equation
- the processor 520 is specifically configured to: query the data table according to the first equation and the second equation.
- the target query condition further includes a second inequality different from the first inequality.
- the first The second inequality is specifically A>B, where trunc( ⁇ ) is a truncated rounding operation, C is a positive integer greater than 1, n is an integer, A and B are the values in the data table, and A and B are A positive integer.
- the second inequality Specifically, D>E, where trunc( ⁇ ) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, D and E are values in the data table, and D and E are positive integers .
- the second inequality is G ⁇ H, where trunc( ⁇ ) is a truncated rounding operation, I is a positive integer greater than 1, p is an integer, G and H are values in the data table, and G and H is a positive integer.
- the initial query condition is a self-joining selfjoin initial query condition.
- the processor 520 is further configured to: before converting the first inequality to obtain a target query condition, determining whether the data amount included in the data table is greater than a first threshold; When the data amount included in the data table is greater than the first threshold, the first inequality is converted to obtain the target query condition.
- the apparatus 500 may be specifically the optimizer in the above embodiments, and may be used to perform various steps and/or processes corresponding to the optimizer in the above method embodiments.
- the memory 540 can include read only memory and random access memory and provides instructions and data to the processor. A portion of the memory may also include a non-volatile random access memory.
- the memory can also store information of the device type.
- the processor 520 can be configured to execute instructions stored in the memory, and when the processor executes the instructions, the processor 520 can perform the various steps corresponding to the optimizer in the above method embodiments.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present application.
- each functional unit in the embodiment of the present application may be integrated into one processing unit, or may be each Units exist physically alone, or two or more units can be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the storage medium includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclosed are a method and device for querying a data table. The method comprises: acquiring an initial query condition, wherein the initial query condition comprises a first inequality; performing conversion processing on the first inequality to obtain a target query condition, wherein the target query condition comprises a first equality; and querying a data table according to the target query condition. The present invention can improve the efficiency of data table querying.
Description
本申请要求于2016年11月18日提交中国专利局、申请号为201611022774.X、发明名称为“查询数据表的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201611022774.X filed on Nov. 18, 2016, the entire disclosure of which is incorporated herein by reference. In the application.
本申请实施例涉及计算机领域,特别涉及计算机领域中的查询数据表的方法和装置。The embodiments of the present application relate to the field of computers, and in particular, to a method and apparatus for querying a data table in the field of computers.
数据库管理系统(database management system,DBMS)是位于用户与操作系统之间的一层数据库管理软件,数据库调优可以使数据库应用运行得更快,其目标是使数据库有更高的吞吐量和更短的响应时间。The database management system (DBMS) is a layer of database management software between the user and the operating system. Database tuning can make the database application run faster. The goal is to make the database have higher throughput and more. Short response time.
对数据库内核来说,主要实现结构化查询语言(structured query language,SQL)层面的优化技术,包括:查询重用技术、查询重写规则技术、查询算法优化技术、并行查询的优化技术、分布式查询优化技术及其他优化技术等,但是,在并行计算的并行数据库集群(massively parallel processing database,MPPDB)中,优化器模块在执行作业优化时,根据执行作业中语句类型、执行作业的代价估算数值和各个数据节点上的数据分布等,生成最优执行计划,但是优化器模块生成的最优执行计划不一定能做到高效率的优化,例如,在航过公司票务系统中的数据推送服务乘客,例如,航空系统的数据库中找出在时间相差60分钟内,且同一地点登机的乘客A和乘客B,但是执行过程中发现,在同一差值时间内,有上亿条乘客登机记录,在数据库中查询不同的乘客及时间差值时,效率较低。For the database kernel, it mainly implements the optimization technology of structured query language (SQL), including: query reuse technology, query rewrite rule technology, query algorithm optimization technology, parallel query optimization technology, distributed query. Optimization technology and other optimization techniques, but in the parallel computing database (MPPDB) of parallel computing, the optimizer module estimates the value and the cost of executing the job according to the type of the statement in the execution job when performing job optimization. The data distribution on each data node, etc., generates an optimal execution plan, but the optimal execution plan generated by the optimizer module does not necessarily achieve efficient optimization, for example, the data push service passengers in the airline ticketing system, For example, in the database of the aviation system, passengers A and B who boarded the aircraft at the same time within 60 minutes of time were found, but during the execution, it was found that there were hundreds of millions of passengers boarding records within the same time difference. Efficiency when querying different passengers and time differences in the database Low.
发明内容Summary of the invention
本申请实施例提供的查询数据表的方法,可以提高查询数据表的效率。The method for querying the data table provided by the embodiment of the present application can improve the efficiency of querying the data table.
第一方面,提供了一种查询数据表的方法,该方法包括:获取初始查询条件,所述初始查询条件包括第一不等式;将所述第一不等式进行转换处理,得到目标查询条件,所述目标查询条件包括第一等式;根据所述目标查询条件,查询数据表。In a first aspect, a method for querying a data table is provided, the method comprising: obtaining an initial query condition, the initial query condition including a first inequality; converting the first inequality to obtain a target query condition, The target query condition includes a first equation; and the data table is queried according to the target query condition.
具体,将所述第一不等式进行转换,得到目标查询条件,可以是:将所述第一不等式进行转换得到第一等式。Specifically, the first inequality is converted to obtain a target query condition, and the first inequality may be converted to obtain a first equation.
在本申请实施例中,将初始查询条件进行转换得到目标查询条件,该初始查询条件包括第一不等式,该目标查询条件包括第一等式,这样,通过将第一不等转化为第一等式,利用第一不等式进行查询,在一定的数据范围内,满足不等式的数据较多使得查询效率较低,但是通过转化后的等式进行查询,缩小查询范围,能够提高查询的效率。In the embodiment of the present application, the initial query condition is converted to obtain a target query condition, where the initial query condition includes a first inequality, and the target query condition includes a first equation, such that by converting the first inequality to the first class In the formula, using the first inequality to query, within a certain data range, the data satisfying the inequality is more inefficient, so the query efficiency is lower, but the query is transformed by the transformed equation, and the query scope is narrowed, thereby improving the efficiency of the query.
在某些实现方式中,所述目标查询条件还包括不同于所述第一不等式的第二不等式。In some implementations, the target query condition further includes a second inequality different from the first inequality.
在本申请实施例中,目标查询条件中还包括第二不等式,即可以将初始查询条件中的第一不等式转化为第一等式和第二不等式,可以根据第一等式和第二不等式查询数据表,这样,可以进一步通过第二不等式缩小查询范围,提高查询效率。In the embodiment of the present application, the target query condition further includes a second inequality, that is, the first inequality in the initial query condition may be converted into the first equation and the second inequality, and may be inquired according to the first equation and the second inequality. The data table, in this way, can further narrow the scope of the query by the second inequality, and improve the query efficiency.
在某些实现方式中,所述初始查询条件还包括第二等式,其中,所述根据所述目标
查询条件查询数据表,包括:根据所述第一等式和所述第二等式查询所述数据表。In some implementations, the initial query condition further includes a second equation, wherein the according to the target
The query condition query data table includes: querying the data table according to the first equation and the second equation.
可选地,本申请实施例的查询方法可以是哈希连接(hashjoin)或者是自连接(selfjoin),当是hashjoin时,在hashjoin查询方法中的初始查询条件中还可以包括第二等式,利用hashjoin的第二等式以及转化后的第一等式查询数据表,或者利用第二等式、第一等式和第二不等式查询数据表,可以解决在hashjoin算法处理时利用第二等式过滤后每个hash桶对应数据量巨大的问题,通过第一等式,第二不等式和第二等式过滤合适的数据,降低hash桶中的数据量。Optionally, the query method in the embodiment of the present application may be a hash join or a self-join. When it is a hashjoin, the second query may be included in the initial query condition in the hashjoin query method. Using the second equation of hashjoin and the transformed first equation to query the data table, or using the second equation, the first equation, and the second inequality to query the data table, the second equation can be solved in the hashjoin algorithm processing. After filtering, each hash bucket corresponds to a huge amount of data, and the appropriate data is filtered by the first equation, the second inequality, and the second equation to reduce the amount of data in the hash bucket.
在某些实现方式中,若所述第一不等式的形式为:A-B>C,则所述将所述第一不等式进行转换处理,得到目标查询条件,包括:将A-B>C转化为trunc(A/C)=trunc(B/C)+n,其中,所述第一等式具体为trunc(A/C)=trunc(B/C)+n,trunc(·)为将数字截尾取整的运算,C为大于1的正整数,n为整数,即可以为正整数也可以为负整数,A和B为所述数据表中的数值,并且A和B为正整数。In some implementations, if the form of the first inequality is: AB>C, the converting the first inequality to obtain a target query condition, including: converting AB>C into trunc (A) /C)=trunc(B/C)+n, where the first equation is specifically trunc(A/C)=trunc(B/C)+n, and trunc(·) is a truncation of the number For the operation, C is a positive integer greater than 1, and n is an integer, that is, it can be a positive integer or a negative integer, A and B are the values in the data table, and A and B are positive integers.
在某些实现方式中,若所述第一不等式的形式为:A-B>C,则所将所述第一不等式进行转换处理,得到目标查询条件,包括:将A-B>C转化为trunc(A/C)=trunc(B/C)+n和A>B,其中,所述第二不等式具体为A>B,所述第一等式具体为trunc(A/C)=trunc(B/C)+n,trunc(·)为将数字截尾取整的运算,C为大于1的正整数,n为整数,即可以为正整数也可以为负整数,A和B为所述数据表中的数值,并且A和B为正整数。In some implementations, if the form of the first inequality is: AB>C, the first inequality is converted to obtain a target query condition, including: converting AB>C into trunc (A/ C)=trunc(B/C)+n and A>B, wherein the second inequality is specifically A>B, and the first equation is specifically trunc(A/C)=trunc(B/C) +n, trunc(·) is an operation that rounds the digits, C is a positive integer greater than 1, and n is an integer, which can be a positive integer or a negative integer, and A and B are in the data table. Value, and A and B are positive integers.
在某些实现方式中,若所述第一不等式的形式为:D-E<F,则将所述第一不等式进行转换处理,得到目标查询条件,包括:将D-E<F转化为trunc(D/F)=trunc(E/F)+m,所述第一等式具体为trunc(D/F)=trunc(E/F)+m,为将数字截尾取整运算,F为大于1的正整数,m为整数,即可以为正整数也可以为负整数,D和E为所述数据表中的数值,并且D和E为正整数。In some implementations, if the form of the first inequality is: DE<F, the first inequality is converted to obtain a target query condition, including: converting DE<F into trunc (D/F) )=trunc(E/F)+m, the first equation is specifically trunc(D/F)=trunc(E/F)+m, which is a rounding of the number, and F is a positive value greater than 1. An integer, m is an integer, that is, it can be a positive integer or a negative integer, D and E are the values in the data table, and D and E are positive integers.
在某些实现方式中,若所述第一不等式的形式为:D-E<F,则所述将所述第一不等式进行转换处理,得到目标查询条件,包括:将D-E<F转化为trunc(D/F)=trunc(E/F)+m和D>E,其中,所述第二不等式具体为D>E,所述第一等式具体为trunc(D/F)=trunc(E/F)+m,为将数字截尾取整运算,F为大于1的正整数,m为整数,即可以为正整数也可以为负整数,D和E为所述数据表中的数值,并且D和E为正整数。In some implementations, if the form of the first inequality is: DE<F, the converting the first inequality to obtain a target query condition, including: converting DE<F into trunc(D /F)=trunc(E/F)+m and D>E, wherein the second inequality is specifically D>E, and the first equation is specifically trunc(D/F)=trunc(E/F +m, for the truncation of the number, F is a positive integer greater than 1, m is an integer, that is, it can be a positive integer or a negative integer, D and E are the values in the data table, and D And E is a positive integer.
在某些实现方式中,若所述第一不等式的形式为:G-H<I,则所述将所述第一不等式进行转换处理,得到目标查询条件,包括:将G-H<I转化为trunc(G/I)=trunc(H/I)+p,其中,所述第一等式具体为trunc(G/I)=trunc(H/I)+p,为将数字截尾取整运算,I为大于1的正整数,p为整数,即可以为正整数也可以为负整数,G和H为所述数据表中的数值,并且G和H为正整数。In some implementations, if the form of the first inequality is: GH<I, the converting the first inequality to obtain a target query condition, including: converting GH<I into trunc (G) /I)=trunc(H/I)+p, where the first equation is specifically trunc(G/I)=trunc(H/I)+p, which is a truncation of the number, I is A positive integer greater than 1, p is an integer, that is, a positive integer or a negative integer, G and H are values in the data table, and G and H are positive integers.
在某些实现方式中,若所述第一不等式的形式为:G-H<I,则所述将所述第一不等式进行转换处理,得到目标查询条件,包括:将G-H<I转化为trunc(G/I)=trunc(H/I)+p和G<H,其中,所述第二不等式具体为G<H,所述第一等式具体为trunc(G/I)=trunc(H/I)+p,为将数字截尾取整,I为大于1的正整数,p为整数,即可以为正整数也可以为负整数,G和H为所述数据表中的数值,并且G和H为正整数。In some implementations, if the form of the first inequality is: GH<I, the converting the first inequality to obtain a target query condition, including: converting GH<I into trunc (G) /I)=trunc(H/I)+p and G<H, wherein the second inequality is specifically G<H, and the first equation is specifically trunc(G/I)=trunc(H/I +p, to round the digits, I is a positive integer greater than 1, p is an integer, ie it can be a positive integer or a negative integer, G and H are the values in the data table, and G and H is a positive integer.
在某些实现方式中,所述初始查询条件为自连接selfjoin初始查询条件。selfjoin初始查询条件的查询是指对一张表可以和自身进行连接,这样,可以简化查询的复杂度,提高查询的效率。
In some implementations, the initial query condition is a self-joined selfjoin initial query condition. The query of selfjoin initial query condition means that a table can be connected with itself, which can simplify the complexity of the query and improve the efficiency of the query.
在某些实现方式中,在将所述第一不等式进行转换处理,得到目标查询条件之前,所述方法还包括:判断所述数据表包括的数据量是否大于第一阈值;所述将所述第一不等式进行转换处理,得到目标查询条件,包括:当所述数据表包括的数据量大于所述第一阈值时,将所述第一不等式进行转换处理,得到所述目标查询条件。In some implementations, before converting the first inequality to obtain a target query condition, the method further includes: determining whether the data amount included in the data table is greater than a first threshold; The first inequality is converted to obtain the target query condition, including: when the data amount included in the data table is greater than the first threshold, converting the first inequality to obtain the target query condition.
这样,对于海量数据的查询,若只采用第一不等式进行查询,则需要遍历数据表中所有数据,找出满足第一不等式的数据,这样数据量越大,查询时间会成倍的增加,通过将第一不等式转化第二不等式和第一等式,在遍历数据表中的数据时,找出满足第二不等式和第一等式条件的数据,这样会降低查询的复杂度,减少查询的时间,进一步提高查询的效率。In this way, for the query of massive data, if only the first inequality is used for querying, it is necessary to traverse all the data in the data table to find the data satisfying the first inequality, so that the larger the amount of data, the more the query time will increase. Converting the first inequality to the second inequality and the first equation, when traversing the data in the data table, finding data satisfying the second inequality and the first equality condition, which reduces the complexity of the query and reduces the query time. To further improve the efficiency of the query.
第二方面,提供了一种查询数据表的装置,用于执行第一方面或第一方面任意可能的实现方式中的方法。In a second aspect, an apparatus for querying a data table is provided for performing the method of the first aspect or any possible implementation of the first aspect.
第三方面,提供了一种查询数据表的装置,该装置包括:接收器、发送器、存储器、处理器和总线系统。其中,该接收器、该发送器、该存储器和该处理器通过该总线系统相连,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制接收器接收信号,并控制发送器发送信号,并且当该处理器执行该存储器存储的指令时,该执行使得该处理器执行第一方面或第一方面的任意可能的实现方式中的方法。In a third aspect, an apparatus for querying a data table is provided, the apparatus comprising: a receiver, a transmitter, a memory, a processor, and a bus system. Wherein the receiver, the transmitter, the memory and the processor are connected by the bus system, the memory is for storing instructions for executing the instructions stored by the memory to control the receiver to receive signals and control the sending The transmitter transmits a signal, and when the processor executes the memory stored instructions, the execution causes the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
第四方面,提供了一种查询数据表的装置,该装置包括:存储器和处理器。其中,存储器用于存储计算机可执行的指令,处理器用于读取所述计算机可执行的指令并可以执行第一方面或第一方面的任意可能的实现方式中的方法。In a fourth aspect, an apparatus for querying a data table is provided, the apparatus comprising: a memory and a processor. Wherein the memory is for storing computer executable instructions, the processor is for reading the computer executable instructions and may perform the method of the first aspect or any possible implementation of the first aspect.
第五方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。In a fifth aspect, a computer readable medium is provided for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
图1示出了本申请实施例的应用场景的示意图。FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application.
图2示出了根据本申请实施例的查询数据表的方法示意图。FIG. 2 shows a schematic diagram of a method of querying a data table according to an embodiment of the present application.
图3示出了根据本申请实施例的另一查询数据表的方法示意图。FIG. 3 shows a schematic diagram of another method of querying a data table according to an embodiment of the present application.
图4示出了根据本申请实施例的查询数据表的装置示意图。FIG. 4 shows a schematic diagram of an apparatus for querying a data table according to an embodiment of the present application.
图5示出了根据本申请实施例的另一查询数据表的装置示意图。FIG. 5 shows a schematic diagram of another apparatus for querying a data table according to an embodiment of the present application.
应理解,本申请实施例的技术方案可以应用于各种数据库系统,例如,关系型数据库管理系统(relational database management system,RDBMS),非关系型(NoSQL)数据库系统,大规模并行处理数据库(massively parallel processing database,MPP-DB),等等,本申请实施例对此不做限定。It should be understood that the technical solutions of the embodiments of the present application can be applied to various database systems, for example, a relational database management system (RDBMS), a non-relational (NoSQL) database system, and a massively parallel processing database (massively Parallel processing database, MPP-DB), and the like, which are not limited by the embodiment of the present application.
还应理解,本申请实施例以SQL语言为例进行描述,但本申请实施例也可以采用其他语言,例如,面向对象的查询语言(hibernate query language,HQL),等等,本申请实施例对此不做限定。It should be understood that the embodiment of the present application is described by taking the SQL language as an example, but the embodiment of the present application may also adopt other languages, for example, an object-oriented query language (HQL), and the like. This is not limited.
下面先介绍一下数据库系统及其结构。数据库系统一般包括数据库、硬件、软件和人员四个组成部分。数据库(database,DB)是指长期存储在计算机内部或外部的存储介质上的、有组织、可共享的数据的集合。数据库中的数据按一定的数学模型组织、描
述和存储,具有较小的冗余,较高的数据独立性和易扩展性,并可为各种用户共享。硬件是指构成计算机系统的各种物理设备,即包括存储所需的内部设备,也包括存储所需的外部设备。硬件的配置应满足整个数据库系统的需要。软件包括操作系统、DBMS及应用程序。人员包括最终用户,他们利用系统的接口或查询语言访问数据库,如向数据库中添加数据、删除数据或查询数据。DBMS是数据库系统的核心软件,在操作系统的支持下科学地组织、存储数据和高效获取、维护数据,其可以使得用户可以通过不同的方法去建立、修改、优化或查询数据库中的数据,例如,DBMS可以包括优化器模块来进行逻辑查询优化。Let's first introduce the database system and its structure. The database system generally includes four components: database, hardware, software, and personnel. A database (database) refers to a collection of organized, shareable data that is stored on a storage medium internal or external to a computer. The data in the database is organized and described according to a certain mathematical model.
Description and storage, with less redundancy, higher data independence and scalability, and can be shared by various users. Hardware refers to the various physical devices that make up a computer system, that is, the internal devices required for storage, as well as the external devices required for storage. The configuration of the hardware should meet the needs of the entire database system. The software includes the operating system, DBMS, and applications. People include end users who access the database using the system's interface or query language, such as adding data to the database, deleting data, or querying data. DBMS is the core software of the database system. Scientifically organizes, stores data and efficiently acquires and maintains data with the support of the operating system. It can enable users to create, modify, optimize or query data in the database through different methods, for example. The DBMS can include an optimizer module for logical query optimization.
上述对数据库系统的介绍只是为了更好地理解本申请实施例的技术方案,不应对应用本申请实施例中的查询数据表的方法构成限制。The above description of the database system is only for a better understanding of the technical solution of the embodiment of the present application, and the method for applying the query data table in the embodiment of the present application should not be limited.
图1示出了查询数据的过程,例如,以开源数据库Postgres为例,描述SQL语句的查询过程包括以下步骤:Figure 1 shows the process of querying data. For example, taking the open source database Postgres as an example, the query process for describing SQL statements includes the following steps:
S101,开始查询。S101, start the query.
S102,分析SQL初始查询条件,通过经词法分析、语法分析和语义检查后转换为一颗查询树(Query Tree)传给下一个阶段。S102: Analyze the initial query condition of the SQL, and convert it into a query tree (Query Tree) through the lexical analysis, the syntax analysis, and the semantic check to pass to the next stage.
本申请实施例为例描述方便以SQL语句为例进行描述,应理解本申请实施例可以采用其他的初始查询条件进行描述,初始查询条件的形式可以是类SQL语句,例如“select姓名from个人基本信息where年龄>30”。或者,初始查询条件的形式也可以是自然语言,例如“查询个人基本信息中年龄大于30岁的人员姓名”。The description of the embodiment of the present application is conveniently described by taking an SQL statement as an example. It should be understood that the embodiment of the present application may be described by using other initial query conditions. The form of the initial query condition may be a SQL-like statement, such as “select name from personal basics. Information where age > 30". Alternatively, the form of the initial query condition may also be a natural language, such as "inquiring the name of a person whose age is greater than 30 years old in the basic information of the individual".
S103,根据S101得到的查询树进行视图重写,例如,可以用基表进行重写。S103, performing view rewriting according to the query tree obtained in S101, for example, rewriting with a base table.
S104,根据视图重写后的结果确定是否需要集合操作。S104. Determine, according to the result of the view rewriting, whether a set operation is required.
S105,当需要进行集合操作时,将集合分解为普通SQL,将分解后的普通SQL执行S106。S105. When a set operation is required, the set is decomposed into ordinary SQL, and the decomposed ordinary SQL is executed as S106.
S106,当不需要进行集合操作时,进行逻辑优化查询,例如,运用查询技术,进行等价交换查询。S106: Perform logic optimization query when the aggregation operation is not needed, for example, using a query technology to perform an equivalent exchange query.
S107,在进行逻辑优化查询之后,进行物理优化查询,例如,在多个查询路径中找出代价最小的查询路径。S107. After performing the logic optimization query, perform a physical optimization query, for example, find the least expensive query path among the multiple query paths.
应理解,S106可以在S107之前,或者之后,本申请实施例对此不作限制。It should be understood that S106 may be before or after S107, and the embodiment of the present application does not limit this.
S108,将逻辑优化查询和物理优化查询得到的计划进行分组、排序、聚集、去重优化。S108: group, sort, aggregate, and de-optimize the plan obtained by the logic optimization query and the physical optimization query.
S109,执行器执行计划。S109, the actuator executes the plan.
S110,返回执行器执行完的执行计划的查询结果。S110, returning a query result of the execution plan executed by the actuator.
应理解,图1所述的查询数据的过程只是为了更好地理解本申请实施例的技术方案,不应对应用本申请实施例中的查询数据表的方法构成限制。It should be understood that the process of querying the data described in FIG. 1 is only for a better understanding of the technical solution of the embodiment of the present application, and the method for applying the query data table in the embodiment of the present application is not limited.
本申请实施例主要描述S106步骤中的逻辑优化查询,例如海量的用户数据的存储和查询的数据量都非常大,通常情况会采用分布式数据库,对数据进行分片,以提高系统的查询性能。但是,从目前的使用情况来看,分布式数据库并没有很好地解决海量用户数据查询的问题。例如,用户的数量可达10亿,用户的属性可达100万,这样的数据存储在传统的数据库中,可达到百亿列、百万行。在这样的业务场景下,即使采用分布式数据库,查询的数据量依然非常大。本申请实施例可以针对海量数据的查询条件进行优
化,以降低查询的数据量,通常情况下最优的查询计划可以理解为查询时间最短的查询技术。下面针对本申请实施例的查询优化方法进行描述。The embodiment of the present application mainly describes the logic optimization query in the step S106. For example, the amount of data stored and queried by the massive user data is very large, and the distributed database is usually used to segment the data to improve the query performance of the system. . However, from the current usage situation, the distributed database does not solve the problem of massive user data query. For example, the number of users can reach 1 billion, and the attributes of users can reach 1 million. Such data can be stored in a traditional database, which can reach 10 billion columns and millions of rows. In such a business scenario, even with a distributed database, the amount of data queried is still very large. The embodiment of the present application can optimize the query condition of the massive data.
In order to reduce the amount of data in the query, the optimal query plan can be understood as the shortest query technology. The following describes the query optimization method of the embodiment of the present application.
图2示出了本申请实施例的查询优化方法200,例如该方法200可以由数据库系统中优化器执行,该方法200包括:FIG. 2 illustrates a query optimization method 200 of an embodiment of the present application. For example, the method 200 can be performed by an optimizer in a database system. The method 200 includes:
S210,获取初始查询条件,所述初始查询条件包括第一不等式。S210. Acquire an initial query condition, where the initial query condition includes a first inequality.
作为一个可选实施例,所述初始查询条件为selfjoin初始查询条件,selfjoin初始查询条件的查询是指对一张表可以和自身进行连接,这样,可以简化查询的复杂度,提高查询的效率。As an optional embodiment, the initial query condition is a self-join initial query condition, and the self-join initial query condition refers to a table that can be connected with itself, so that the complexity of the query can be simplified, and the query efficiency is improved.
S220,将所述第一不等式进行转换处理,得到目标查询条件,所述目标查询条件包括第一等式。S220. Perform conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation.
作为一个可选实施例,若所述第一不等式的形式为:A-B>C,则所述第一等式具体为:trunc(A/C)=trunc(B/C)+n,所述第二不等式具体为A>B,其中,trunc(·)为截尾取整运算,C为大于1的正整数,n为整数,即可以为正整数也可以为负整数,A和B为所述数据表中的数值,并且A和B为正整数。可选地,S220,包括:将A-B>C转化为trunc(A/C)=trunc(B/C)+n,或者S220,包括:将A-B>C转化为trunc(A/C)=trunc(B/C)+n和A>B,其中,所述第二不等式为A>B,所述第一等式为trunc(A/C)=trunc(B/C)+n。As an optional embodiment, if the form of the first inequality is: AB>C, the first equation is specifically: trunc(A/C)=trunc(B/C)+n, the first The second inequality is specifically A>B, where trunc(·) is a truncated rounding operation, C is a positive integer greater than 1, and n is an integer, that is, it can be a positive integer or a negative integer, and A and B are The values in the data table, and A and B are positive integers. Optionally, S220, comprising: converting AB>C to trunc(A/C)=trunc(B/C)+n, or S220, including: converting AB>C into trunc(A/C)=trunc( B/C)+n and A>B, wherein the second inequality is A>B, and the first equation is trunc(A/C)=trunc(B/C)+n.
作为一个可选实施例,若所述第一不等式的形式为:D-E<F,所述第一等式具体为:trunc(D/F)=trunc(E/F)+m所述第二不等式具体为D>E,其中,trunc(·)为截尾取整运算,F为大于1的正整数,m为整数,即可以为正整数也可以为负整数,D和E为所述数据表中的数值,并且D和E为正整数;可选地,S220,包括:将D-E<F转化为trunc(D/F)=trunc(E/F)+m;或者S220,包括将D-E<F转化为trunc(D/F)=trunc(E/F)+m和D>E,其中,所述第二不等式为D>E,所述第一等式为trunc(D/F)=trunc(E/F)+m。As an optional embodiment, if the form of the first inequality is: DE<F, the first equation is specifically: trunc(D/F)=trunc(E/F)+m, the second inequality Specifically, D>E, where trunc(·) is a truncated rounding operation, F is a positive integer greater than 1, and m is an integer, that is, a positive integer or a negative integer, and D and E are the data tables. a value in , and D and E are positive integers; optionally, S220, including: converting DE<F into trunc(D/F)=trunc(E/F)+m; or S220, including DE<F Converted to trunc(D/F)=trunc(E/F)+m and D>E, wherein the second inequality is D>E, and the first equation is trunc(D/F)=trunc( E/F)+m.
作为一个可选实施例,若所述第一不等式的形式为:G-H<I,所述第一等式具体为:trunc(D/F)=trunc(E/F)+m所述第二不等式具体为D>E,其中,trunc(·)为截尾取整运算,F为大于1的正整数,m为整数,D和E为所述数据表中的数值,并且D和E为正整数;可选地S220,包括:将G-H<I转化为trunc(G/I)=trunc(H/I)+p;或者S220包括:将G-H<I转化为trunc(G/I)=trunc(H/I)+p和G<H,其中,所述第二不等式为G<H,所述第一等式为trunc(G/I)=trunc(H/I)+p。As an optional embodiment, if the form of the first inequality is: GH<I, the first equation is specifically: trunc(D/F)=trunc(E/F)+m, the second inequality Specifically, D>E, where trunc(·) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, D and E are values in the data table, and D and E are positive integers Optionally, S220, comprising: converting GH<I into trunc(G/I)=trunc(H/I)+p; or S220 comprising: converting GH<I into trunc(G/I)=trunc(H) /I) +p and G<H, wherein the second inequality is G<H, and the first equation is trunc(G/I)=trunc(H/I)+p.
S230,根据所述目标查询条件,查询数据表,具体地,可以根据所述目标查询条件中的第一等式,查询数据表。S230. Query the data table according to the target query condition. Specifically, the data table may be queried according to the first equation in the target query condition.
作为一个可选实施例,所述初始查询条件包括第二等式,其中,所述S230,包括:根据所述第一等式和所述第二等式查询数据表,或者所述S230包括:根据所述第一等式、第二等式和第二不等式查询所述数据表查询。具体地,优化器模块在根据SQL语句生成最优的执行计划,需要底层扫描数据,然后对扫描过的数据通过hashjoin来过滤合适的数据,其中,在hashjoin查询的过程中需要包括第二等式,并且初始查询条件中包括第一不等式时,则需要将第一不等式重写为第二不等式和第一等式,利用hashjoin的第二等式,重写后的第二不等式和第一等式进行语句的查询,可以解决在hashjoin算法处理时利用第二等式过滤后每个hash桶对应数据量巨大的问题,通过第一等式,第二不等式和第二等式过滤合适的数据,降低hash桶中的数据量。As an optional embodiment, the initial query condition includes a second equation, where the S230 includes: querying a data table according to the first equation and the second equation, or the S230 includes: The data table query is queried according to the first equation, the second equation, and the second inequality. Specifically, the optimizer module generates an optimal execution plan according to the SQL statement, needs the underlying scan data, and then filters the scanned data through the hashjoin to filter the appropriate data, wherein the second equation needs to be included in the hashjoin query process. And when the first inequality is included in the initial query condition, the first inequality needs to be rewritten into the second inequality and the first inequal, the second inequality of the hashjoin, the second inequality after the rewriting, and the first equation are used. The query of the statement can solve the problem that the amount of data corresponding to each hash bucket is huge after filtering by the second equation in the hashjoin algorithm processing, and filtering the appropriate data by the first equation, the second inequality and the second equation, and reducing The amount of data in the hash bucket.
作为一个例子,现有技术中找出在时间相差60分钟内,且同一地点登机的乘客A和
乘客B的SQL初始查询条件为:As an example, in the prior art, passengers A and passengers who boarded the aircraft at the same time within 60 minutes are found.
Passenger B's initial SQL query conditions are:
Select*a.id,b.id from db a,db b where a.id<>b.id and a.port=b.port and a.time–b.time<60。Select*a.id, b.id from db a, db b where a.id<>b.id and a.port=b.port and a.time–b.time<60.
通过本申请实施例的初始查询条件重写后,找出在时间相差60分钟内,且同一地点登机的乘客A和乘客B的SQL目标查询条件为:After rewriting the initial query conditions of the embodiment of the present application, the SQL target query conditions of passengers A and B who are boarding in the same place within 60 minutes are found as follows:
Select*a.id,b.id from db a,db b where a.id<>b.id and a.port=b.port and trunc(a.time/60)=trunk(b.time/60)and a.time>=b.time。Select*a.id, b.id from db a, db b where a.id<>b.id and a.port=b.port and trunc(a.time/60)=trunk(b.time/60) And a.time>=b.time.
作为一个例子,图3对查询数据表的方法300进行描述,该方法300包括:As an example, FIG. 3 depicts a method 300 of querying a data table, the method 300 including:
S301,开始查询。S301, start the query.
S302,判断查询方法是否为selfjoin方法,若不是selfjoin方法,则跳到S307。S302. Determine whether the query method is a selfjoin method. If it is not the selfjoin method, skip to S307.
S303,在S302中,若是selfjoin方法,则判断查询代价,确定代价估算行数是否很大,例如判断行数大于某一特定预设阈值,当行数小于预设阈值时,或者例如前述的:当所述数据表中的数值的数量大于所述第一阈值时,认为数据量大,则跳到S307。S303. In S302, if it is a selfjoin method, determine a query cost, and determine whether the number of estimated cost rows is large, for example, determining that the number of rows is greater than a certain preset threshold, when the number of rows is less than a preset threshold, or for example, as described above: When the number of values in the data table is greater than the first threshold, if the amount of data is considered to be large, then the process jumps to S307.
S304,在S303中,若行数大于预设阈值时,则判断初始查询条件中是否与可转换的非等值表达式,若初始查询条件中没有可转换的非等值表达式,则跳到S307,例如该非等值表达式为前述的第一不等式。S304. In S303, if the number of rows is greater than a preset threshold, determine whether the initial query condition is a convertible non-equivalent expression. If there is no convertible non-equivalent expression in the initial query condition, skip to S307, for example, the non-equivalent expression is the aforementioned first inequality.
S305,若S304中,初始查询条件中存在可转换的非等值表达式,则将非等值表达式转换为等值表达式,可选地,将该非等值表达式转化为另外一个非等值表达式和等值表达式,例如,该等值表达式可以为前述的第一等式,另外一个非等式表达式可以为前述的第二不等式。S305. If there is a convertible non-equivalent expression in the initial query condition in S304, the non-equivalent expression is converted into an equivalent expression, and optionally, the non-equivalent expression is converted into another non-equivalent expression. Equivalent expressions and equivalent expressions, for example, the equivalent expression may be the first equation described above, and another non-equivalent expression may be the aforementioned second inequality.
S306,对目标查询条件进行重写,这里的目标查询条件可以包括等值表达式;或者,该目标查询条件可以包括另外一个非等值表达式和等值表达式。S306, rewriting the target query condition, where the target query condition may include an equivalent expression; or the target query condition may include another non-equivalent expression and an equivalent expression.
S307,物理查询优化阶段。S307, physical query optimization stage.
S308,结束查询优化。S308, ending the query optimization.
应理解,图3所示的方法仅是示例性的,S302,S303,S304三个判断的过程可以是满足至少一个条件,而并不是全部需要满足。例如,可以没有S302或S303等,本申请实施例不限于此。It should be understood that the method shown in FIG. 3 is merely exemplary, and the three judgment processes of S302, S303, and S304 may satisfy at least one condition, and not all need to be satisfied. For example, there may be no S302 or S303, etc., and embodiments of the present application are not limited thereto.
因此,当对海量的数据进行查询时,使用hashjoin算法中的第二等式和初始查询条件中的第一不等式进行查询时,查询效率较低,通过将第一不等式转化成第二不等式和第一等式条件,增加了查询数据的条件,进一步可以提高查询数据的效率。Therefore, when querying a large amount of data, when using the second equation in the hashjoin algorithm and the first inequality in the initial query condition, the query efficiency is low, by converting the first inequality into the second inequality and the first The one-element condition increases the conditions for querying data, which further improves the efficiency of querying data.
上面结合图2和图3对本申请实施例实施提供的查询数据表的方法进行描述,下面结合图4和图5描述本申请实施例提供的查询数据表的装置。The method for querying the data table provided by the embodiment of the present application is described above with reference to FIG. 2 and FIG. 3. The apparatus for querying the data table provided by the embodiment of the present application is described below with reference to FIG. 4 and FIG.
图4示出了本申请实施例提供的查询数据表的装置300,该装置300包括:FIG. 4 shows an apparatus 300 for querying a data table provided by an embodiment of the present application. The apparatus 300 includes:
获取模块410,用于获取初始查询条件,所述初始查询条件包括第一不等式;The obtaining module 410 is configured to obtain an initial query condition, where the initial query condition includes a first inequality;
转化模块420,用于将所述第一不等式进行转换处理,得到目标查询条件,所述目标查询条件包括第一等式;The conversion module 420 is configured to perform conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation;
查询模块430,用于根据所述目标查询条件,查询数据表。The query module 430 is configured to query the data table according to the target query condition.
作为一个可选实施例,所述初始查询条件还包括第二等式,所述查询模块430具体用于:根据所述第一等式和所述第二等式查询所述数据表。As an optional embodiment, the initial query condition further includes a second equation, and the query module 430 is specifically configured to: query the data table according to the first equation and the second equation.
作为一个可选实施例,所述目标查询条件还包括不同于所述第一不等式的第二不等
式。As an optional embodiment, the target query condition further includes a second unequal difference from the first inequality
formula.
作为一个可选实施例,若所述第一不等式的形式为:A-B>C,则所述第一等式具体为:trunc(A/C)=trunc(B/C)+n,所述第二不等式具体为A>B,其中,trunc(·)为截尾取整运算,C为大于1的正整数,n为整数,A和B为所述数据表中的数值,并且A和B为正整数。As an optional embodiment, if the form of the first inequality is: AB>C, the first equation is specifically: trunc(A/C)=trunc(B/C)+n, the first The second inequality is specifically A>B, where trunc(·) is a truncated rounding operation, C is a positive integer greater than 1, n is an integer, A and B are the values in the data table, and A and B are A positive integer.
作为一个可选实施例,若所述第一不等式的形式为:D-E<F,所述第一等式具体为:trunc(D/F)=trunc(E/F)+m所述第二不等式具体为D>E,其中,trunc(·)为截尾取整运算,F为大于1的正整数,m为整数,D和E为所述数据表中的数值,并且D和E为正整数。As an optional embodiment, if the form of the first inequality is: DE<F, the first equation is specifically: trunc(D/F)=trunc(E/F)+m, the second inequality Specifically, D>E, where trunc(·) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, D and E are values in the data table, and D and E are positive integers .
作为一个可选实施例,若所述第一不等式的形式为:G-H<I,则所述第一等式具体为trunc(G/I)=trunc(H/I)+p和G<H,所述第二不等式为G<H,其中,trunc(·)为截尾取整运算,I为大于1的正整数,p为整数,G和H为所述数据表中的数值,并且G和H为正整数。As an optional embodiment, if the form of the first inequality is: GH<I, the first equation is specifically trunc(G/I)=trunc(H/I)+p and G<H, The second inequality is G<H, where trunc(·) is a truncated rounding operation, I is a positive integer greater than 1, p is an integer, G and H are values in the data table, and G and H is a positive integer.
作为一个可选实施例,所述初始查询条件为自连接selfjoin初始查询条件。As an optional embodiment, the initial query condition is a self-joining selfjoin initial query condition.
作为一个可选实施例,则所述转化模块420具体还用于:在将所述第一不等式进行转换处理,得到目标查询条件之前,判断所述数据表包括的数据量是否大于第一阈值;当所述数据表包括的数据量大于所述第一阈值时,将所述第一不等式进行转换处理,得到所述目标查询条件。As an optional embodiment, the conversion module 420 is further configured to: before converting the first inequality to obtain a target query condition, determining whether the data amount included in the data table is greater than a first threshold; When the data amount included in the data table is greater than the first threshold, the first inequality is converted to obtain the target query condition.
应理解,这里的装置400以功能模块的形式体现。这里的术语“模块”可以指ASIC、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领域技术人员可以理解,装置400可以具体为上述实施例中的优化器,装置400可以用于执行上述方法实施例中与优化器对应的各个流程和/或步骤,为避免重复,在此不再赘述。It should be understood that the apparatus 400 herein is embodied in the form of a functional module. The term "module" as used herein may refer to an ASIC, an electronic circuit, a processor (eg, a shared processor, a proprietary processor or a group processor, etc.) and memory, a merge logic, and a processor for executing one or more software or firmware programs. / or other suitable components that support the described functionality. In an alternative example, those skilled in the art may understand that the device 400 may be specifically the optimizer in the foregoing embodiment, and the device 400 may be used to execute various processes and/or steps corresponding to the optimizer in the foregoing method embodiments. To avoid repetition, we will not repeat them here.
图5示出了本申请实施例提供的查询数据表的装置500,该装置500包括:接收器510、处理器520、发送器530、存储器540和总线系统550。其中,接收器510、处理器520、发送器530和存储器540通过总线系统550相连,该存储器540用于存储指令,该处理器520用于执行该存储器540存储的指令,以控制该接收器510接收信号,并控制该发送器530发送指令。其中,所述接收器510和所述发送器530可以是收发接口等,本申请实施例对此不作限制。FIG. 5 shows an apparatus 500 for querying a data table provided by an embodiment of the present application. The apparatus 500 includes a receiver 510, a processor 520, a transmitter 530, a memory 540, and a bus system 550. The receiver 510, the processor 520, the transmitter 530 and the memory 540 are connected by a bus system 550 for storing instructions for executing instructions stored in the memory 540 to control the receiver 510. A signal is received and the transmitter 530 is controlled to send an instruction. The receiver 510 and the transmitter 530 may be a transceiver interface, etc., which is not limited in this embodiment of the present application.
其中,接收器510用于获取初始查询条件,所述初始查询条件包括第一不等式;处理器520将所述第一不等式进行转换处理,得到目标查询条件,所述目标查询条件包括第一等式;所述处理器520还用于根据所述目标查询条件,查询数据表。The receiver 510 is configured to obtain an initial query condition, where the initial query condition includes a first inequality; the processor 520 performs conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation. The processor 520 is further configured to query the data table according to the target query condition.
作为一个可选实施例,所述初始查询条件还包括第二等式,所述处理器520具体用于:根据所述第一等式和所述第二等式查询所述数据表。As an optional embodiment, the initial query condition further includes a second equation, and the processor 520 is specifically configured to: query the data table according to the first equation and the second equation.
作为一个可选实施例,所述目标查询条件还包括不同于所述第一不等式的第二不等式。As an optional embodiment, the target query condition further includes a second inequality different from the first inequality.
作为一个可选实施例,若所述第一不等式的形式为:A-B>C,则所述第一等式具体为:trunc(A/C)=trunc(B/C)+n,所述第二不等式具体为A>B,其中,trunc(·)为截尾取整运算,C为大于1的正整数,n为整数,A和B为所述数据表中的数值,并且A和B为正整数。
As an optional embodiment, if the form of the first inequality is: AB>C, the first equation is specifically: trunc(A/C)=trunc(B/C)+n, the first The second inequality is specifically A>B, where trunc(·) is a truncated rounding operation, C is a positive integer greater than 1, n is an integer, A and B are the values in the data table, and A and B are A positive integer.
作为一个可选实施例,若所述第一不等式的形式为:D-E<F,所述第一等式具体为:trunc(D/F)=trunc(E/F)+m所述第二不等式具体为D>E,其中,trunc(·)为截尾取整运算,F为大于1的正整数,m为整数,D和E为所述数据表中的数值,并且D和E为正整数。As an optional embodiment, if the form of the first inequality is: DE<F, the first equation is specifically: trunc(D/F)=trunc(E/F)+m, the second inequality Specifically, D>E, where trunc(·) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, D and E are values in the data table, and D and E are positive integers .
作为一个可选实施例,若所述第一不等式的形式为:G-H<I,则所述第一等式具体为trunc(G/I)=trunc(H/I)+p和G<H,所述第二不等式为G<H,其中,trunc(·)为截尾取整运算,I为大于1的正整数,p为整数,G和H为所述数据表中的数值,并且G和H为正整数。As an optional embodiment, if the form of the first inequality is: GH<I, the first equation is specifically trunc(G/I)=trunc(H/I)+p and G<H, The second inequality is G<H, where trunc(·) is a truncated rounding operation, I is a positive integer greater than 1, p is an integer, G and H are values in the data table, and G and H is a positive integer.
作为一个可选实施例,所述初始查询条件为自连接selfjoin初始查询条件。As an optional embodiment, the initial query condition is a self-joining selfjoin initial query condition.
作为一个可选实施例,则所述处理器520具体还用于:在将所述第一不等式进行转换处理,得到目标查询条件之前,判断所述数据表包括的数据量是否大于第一阈值;当所述数据表包括的数据量大于所述第一阈值时,将所述第一不等式进行转换处理,得到所述目标查询条件。As an optional embodiment, the processor 520 is further configured to: before converting the first inequality to obtain a target query condition, determining whether the data amount included in the data table is greater than a first threshold; When the data amount included in the data table is greater than the first threshold, the first inequality is converted to obtain the target query condition.
也应理解,装置500可以具体为上述实施例中的优化器,并且可以用于执行上述方法实施例中与优化器对应的各个步骤和/或流程。可选地,该存储器540可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器的一部分还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。该处理器520可以用于执行存储器中存储的指令,并且该处理器执行该指令时,该处理器520可以执行上述方法实施例中与优化器对应的各个步骤。It should also be understood that the apparatus 500 may be specifically the optimizer in the above embodiments, and may be used to perform various steps and/or processes corresponding to the optimizer in the above method embodiments. Optionally, the memory 540 can include read only memory and random access memory and provides instructions and data to the processor. A portion of the memory may also include a non-volatile random access memory. For example, the memory can also store information of the device type. The processor 520 can be configured to execute instructions stored in the memory, and when the processor executes the instructions, the processor 520 can perform the various steps corresponding to the optimizer in the above method embodiments.
应理解,在本申请实施例中,编号“第一”、“第二”仅仅为了区分不同的对象,例如,为了区分不同的用户标识或不同的属性标识等,不应对本申请实施例的保护范围构成任何限定。It should be understood that, in the embodiment of the present application, the numbers "first" and "second" are only used to distinguish different objects, for example, in order to distinguish different user identifiers or different attribute identifiers, etc., the protection of the embodiments of the present application should not be The scope constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例中描述的各方法步骤和单元,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。Those skilled in the art will appreciate that the various method steps and elements described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate hardware and software. Interchangeability, the steps and composition of the various embodiments have been generally described in terms of function in the foregoing description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。In the several embodiments provided by the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present application.
另外,在本申请实施例中的各功能单元可以集成在一个处理单元中,也可以是各个
单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in the embodiment of the present application may be integrated into one processing unit, or may be each
Units exist physically alone, or two or more units can be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be substantially or partially contributed to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a The storage medium includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of various kinds within the technical scope disclosed in the embodiments of the present application. Modifications or substitutions are intended to be included within the scope of the present application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.
Claims (18)
- 一种查询数据表的方法,其特征在于,所述方法包括:A method for querying a data table, the method comprising:获取初始查询条件,所述初始查询条件包括第一不等式;Obtaining an initial query condition, where the initial query condition includes a first inequality;将所述第一不等式进行转换处理,得到目标查询条件,所述目标查询条件包括第一等式;Performing conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation;根据所述目标查询条件,查询数据表。The data table is queried according to the target query condition.
- 根据权利要求1所述的方法,其特征在于,所述初始查询条件还包括第二等式,The method of claim 1 wherein said initial query condition further comprises a second equation,其中,所述根据所述目标查询条件,查询数据表,包括:The querying the data table according to the target query condition includes:根据所述第一等式和所述第二等式,查询所述数据表。The data table is queried according to the first equation and the second equation.
- 根据权利要求1或2所述的方法,其特征在于,所述目标查询条件还包括不同于所述第一不等式的第二不等式。The method of claim 1 or 2, wherein the target query condition further comprises a second inequality different from the first inequality.
- 根据权利要求3所述的方法,其特征在于,若所述第一不等式的形式为:A-B>C,则所述第一等式具体为:trunc(A/C)=trunc(B/C)+n,所述第二不等式具体为A>B,其中,trunc(·)为截尾取整运算,C为大于1的正整数,n为整数,A和B为所述数据表中的数值,并且A和B为正整数。The method according to claim 3, wherein if the form of the first inequality is: AB>C, the first equation is specifically: trunc(A/C)=trunc(B/C) +n, the second inequality is specifically A>B, wherein trunc(·) is a truncation rounding operation, C is a positive integer greater than 1, n is an integer, and A and B are values in the data table. And A and B are positive integers.
- 根据权利要求3或4所述的方法,其特征在于,若所述第一不等式的形式为:D-E<F,则所述第一等式具体为:trunc(D/F)=trunc(E/F)+m,所述第二不等式具体为D>E,其中,trunc(·)为截尾取整运算,F为大于1的正整数,m为整数,D和E为所述数据表中的数值,并且D和E为正整数。The method according to claim 3 or 4, wherein if the form of the first inequality is: DE<F, the first equation is specifically: trunc(D/F)=trunc(E/ F)+m, the second inequality is specifically D>E, wherein trunc(·) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, and D and E are in the data table. The value, and D and E are positive integers.
- 根据权利要求3或4所述的方法,其特征在于,若所述第一不等式的形式为:G-H<I,则所述第一等式具体为trunc(G/I)=trunc(H/I)+p和G<H,所述第二不等式为G<H,其中,trunc(·)为截尾取整运算,I为大于1的正整数,p为整数,G和H为所述数据表中的数值,并且G和H为正整数。The method according to claim 3 or 4, wherein if the form of the first inequality is: GH < I, the first equation is specifically trunc (G / I) = trunc (H / I +p and G<H, the second inequality is G<H, where trunc(·) is a truncated rounding operation, I is a positive integer greater than 1, p is an integer, and G and H are the data The values in the table, and G and H are positive integers.
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述初始查询条件为自连接selfjoin初始查询条件。The method according to any one of claims 1 to 6, wherein the initial query condition is a self-joining selfjoin initial query condition.
- 根据权利要求1至7中任一项所述的方法,其特征在于,在将所述第一不等式进行转换处理,得到目标查询条件之前,所述方法还包括:The method according to any one of claims 1 to 7, wherein before the converting the first inequality to obtain a target query condition, the method further comprises:判断所述数据表包括的数据量是否大于第一阈值;Determining whether the amount of data included in the data table is greater than a first threshold;所述将所述第一不等式进行转换处理,得到目标查询条件,包括:Performing conversion processing on the first inequality to obtain target query conditions, including:当所述数据表包括的数据量大于所述第一阈值时,将所述第一不等式进行转换处理,得到所述目标查询条件。When the data amount included in the data table is greater than the first threshold, the first inequality is converted to obtain the target query condition.
- 一种查询数据表的装置,其特征在于,所述装置包括:An apparatus for querying a data table, the apparatus comprising:获取模块,用于获取初始查询条件,所述初始查询条件包括第一不等式;An obtaining module, configured to obtain an initial query condition, where the initial query condition includes a first inequality;转化模块,用于将所述第一不等式进行转换处理,得到目标查询条件,所述目标查询条件包括第一等式;a conversion module, configured to perform conversion processing on the first inequality to obtain a target query condition, where the target query condition includes a first equation;查询模块,用于根据所述目标查询条件,查询数据表。The query module is configured to query the data table according to the target query condition.
- 根据权利要求9所述的装置,其特征在于,所述初始查询条件还包括第二等式,The apparatus according to claim 9, wherein said initial query condition further comprises a second equation,所述查询模块具体用于:根据所述第一等式和所述第二等式,查询所述数据表。The query module is specifically configured to: query the data table according to the first equation and the second equation.
- 根据权利要求9或10所述的装置,其特征在于,所述目标查询条件还包括不同于所述第一不等式的第二不等式。 The apparatus of claim 9 or 10, wherein the target query condition further comprises a second inequality different from the first inequality.
- 根据权利要求11所述的装置,其特征在于,若所述第一不等式的形式为:A-B>C,则所述第一等式具体为:trunc(A/C)=trunc(B/C)+n,所述第二不等式具体为A>B,其中,trunc(·)为截尾取整运算,C为大于1的正整数,n为整数,A和B为所述数据表中的数值,并且A和B为正整数。The apparatus according to claim 11, wherein if the form of the first inequality is: AB>C, the first equation is specifically: trunc(A/C)=trunc(B/C) +n, the second inequality is specifically A>B, wherein trunc(·) is a truncation rounding operation, C is a positive integer greater than 1, n is an integer, and A and B are values in the data table. And A and B are positive integers.
- 根据权利要求11或12所述的装置,其特征在于,若所述第一不等式的形式为:D-E<F,则所述第一等式具体为:trunc(D/F)=trunc(E/F)+m所述第二不等式具体为D>E,其中,trunc(·)为截尾取整运算,F为大于1的正整数,m为整数,D和E为所述数据表中的数值,并且D和E为正整数。The apparatus according to claim 11 or 12, wherein if the form of the first inequality is: DE<F, the first equation is specifically: trunc(D/F)=trunc(E/ The second inequality of F)+m is specifically D>E, wherein trunc(·) is a truncated rounding operation, F is a positive integer greater than 1, m is an integer, and D and E are in the data table. Value, and D and E are positive integers.
- 根据权利要求11或12所述的装置,其特征在于,若所述第一不等式的形式为:G-H<I,则所述第一等式具体为trunc(G/I)=trunc(H/I)+p和G<H,所述第二不等式为G<H,其中,trunc(·)为截尾取整运算,I为大于1的正整数,p为整数,G和H为所述数据表中的数值,并且G和H为正整数。The apparatus according to claim 11 or 12, wherein if the form of the first inequality is: GH < I, the first equation is specifically trunc (G / I) = trunc (H / I +p and G<H, the second inequality is G<H, where trunc(·) is a truncated rounding operation, I is a positive integer greater than 1, p is an integer, and G and H are the data The values in the table, and G and H are positive integers.
- 根据权利要求9至14中任一项所述的装置,其特征在于,所述初始查询条件为自连接selfjoin初始查询条件。The apparatus according to any one of claims 9 to 14, wherein the initial query condition is a self-joining selfjoin initial query condition.
- 根据权利要求9至15中任一项所述的装置,其特征在于,则所述转化模块具体还用于:The device according to any one of claims 9 to 15, wherein the conversion module is further used to:在将所述第一不等式进行转换处理,得到目标查询条件之前,判断所述数据表包括的数据量是否大于第一阈值;Before performing the conversion processing on the first inequality to obtain the target query condition, determining whether the data amount included in the data table is greater than a first threshold;当所述数据表包括的数据量大于所述第一阈值时,将所述第一不等式进行转换处理,得到所述目标查询条件。When the data amount included in the data table is greater than the first threshold, the first inequality is converted to obtain the target query condition.
- 一种查询数据表的装置,其特征在于,包括:An apparatus for querying a data table, comprising:存储器,用于存储程序;Memory for storing programs;处理器,用于执行所述存储器存储的所述程序,当所述程序被执行时,所述处理器用于执行如权利要求1-8中任意一项所述的方法。A processor for executing the program stored by the memory, the processor for performing the method of any one of claims 1-8 when the program is executed.
- 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-8中任意一项所述的方法。 A computer readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1-8.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611022774.XA CN108073641B (en) | 2016-11-18 | 2016-11-18 | Method and device for querying data table |
CN201611022774.X | 2016-11-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018090557A1 true WO2018090557A1 (en) | 2018-05-24 |
Family
ID=62146122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/081321 WO2018090557A1 (en) | 2016-11-18 | 2017-04-21 | Method and device for querying data table |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108073641B (en) |
WO (1) | WO2018090557A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666279A (en) * | 2020-04-14 | 2020-09-15 | 阿里巴巴集团控股有限公司 | Query data processing method and device, electronic equipment and computer storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109656946B (en) * | 2018-09-29 | 2022-12-16 | 创新先进技术有限公司 | Multi-table association query method, device and equipment |
CN110147384B (en) * | 2019-04-17 | 2023-06-20 | 平安科技(深圳)有限公司 | Data search model establishment method, device, computer equipment and storage medium |
CN113296964B (en) * | 2021-07-28 | 2022-01-04 | 阿里云计算有限公司 | Data processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1342291A (en) * | 1999-02-19 | 2002-03-27 | Pc多媒体公司 | Matching engine |
US20110145220A1 (en) * | 2009-12-10 | 2011-06-16 | Ramakumar Kosuru | System and method for executing a query |
CN103838817A (en) * | 2012-11-26 | 2014-06-04 | 埃森哲环球服务有限公司 | Data consistency management |
CN103995879A (en) * | 2014-05-27 | 2014-08-20 | 华为技术有限公司 | Data query method, device and system based on OLAP system |
-
2016
- 2016-11-18 CN CN201611022774.XA patent/CN108073641B/en active Active
-
2017
- 2017-04-21 WO PCT/CN2017/081321 patent/WO2018090557A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1342291A (en) * | 1999-02-19 | 2002-03-27 | Pc多媒体公司 | Matching engine |
US20110145220A1 (en) * | 2009-12-10 | 2011-06-16 | Ramakumar Kosuru | System and method for executing a query |
CN103838817A (en) * | 2012-11-26 | 2014-06-04 | 埃森哲环球服务有限公司 | Data consistency management |
CN103995879A (en) * | 2014-05-27 | 2014-08-20 | 华为技术有限公司 | Data query method, device and system based on OLAP system |
Non-Patent Citations (1)
Title |
---|
SU, WEI: "The research and realization of dualistic constraint database and query optimization", CHINA MASTER'S THESES, 15 January 2009 (2009-01-15), pages 1138 - 1773 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666279A (en) * | 2020-04-14 | 2020-09-15 | 阿里巴巴集团控股有限公司 | Query data processing method and device, electronic equipment and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108073641A (en) | 2018-05-25 |
CN108073641B (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7273045B2 (en) | Dimensional Context Propagation Techniques for Optimizing SQL Query Plans | |
US20180060389A1 (en) | Query optimization over distributed heterogeneous execution engines | |
CN113711197B (en) | Placement of adaptive aggregation operators and attributes in query plans | |
US10162857B2 (en) | Optimized inequality join method | |
EP2608074B1 (en) | Systems and methods for merging source records in accordance with survivorship rules | |
US10311105B2 (en) | Filtering queried data on data stores | |
US9535956B2 (en) | Efficient set operation execution using a single group-by operation | |
US9298774B2 (en) | Changing the compression level of query plans | |
WO2018090557A1 (en) | Method and device for querying data table | |
US9171041B1 (en) | RLE-aware optimization of SQL queries | |
EP2746964A2 (en) | Automatic tuning of database queries | |
US9524305B2 (en) | Boolean term conversion for null-tolerant disjunctive predicates | |
US9760604B2 (en) | System and method for adaptive filtering of data requests | |
US20140074774A1 (en) | Distributed data base system and data structure for distributed data base | |
CN105335403A (en) | Database access method and device, and database system | |
US9430525B2 (en) | Access plan for a database query | |
US10776353B2 (en) | Application programming interface for database access | |
CN107291770B (en) | Mass data query method and device in distributed system | |
US10157234B1 (en) | Systems and methods for transforming datasets | |
US20150169656A1 (en) | Distributed database system | |
US20120078880A1 (en) | Accelerating Database Queries Containing Bitmap-Based Conditions | |
US9405801B2 (en) | Processing a data stream | |
US10521426B2 (en) | Query plan generation for split table query operations | |
US20180210926A1 (en) | Value identifier sets | |
US8126905B2 (en) | System, method, and computer-readable medium for optimizing the performance of outer joins |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17871271 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17871271 Country of ref document: EP Kind code of ref document: A1 |