CN113190577B - Table connection query method, device and storage medium - Google Patents

Table connection query method, device and storage medium Download PDF

Info

Publication number
CN113190577B
CN113190577B CN202110264828.8A CN202110264828A CN113190577B CN 113190577 B CN113190577 B CN 113190577B CN 202110264828 A CN202110264828 A CN 202110264828A CN 113190577 B CN113190577 B CN 113190577B
Authority
CN
China
Prior art keywords
data
query
small
result data
join
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110264828.8A
Other languages
Chinese (zh)
Other versions
CN113190577A (en
Inventor
吕征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN202110264828.8A priority Critical patent/CN113190577B/en
Publication of CN113190577A publication Critical patent/CN113190577A/en
Application granted granted Critical
Publication of CN113190577B publication Critical patent/CN113190577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries

Abstract

The disclosure provides a table connection query method, a table connection query device and a storage medium, which are used for solving the technical problem of Join query efficiency of HBase. According to the scheme provided by the disclosure, in the Join connection query of the big and small tables, the query result of the small table is used as the filtering condition of the big table, the query relational expression of the big table is constructed, and the result of the big table is queried by utilizing the HBase secondary index, so that the speed of the Join query of the big and small tables is increased.

Description

Table connection query method, device and storage medium
Technical Field
The present disclosure relates to the field of database and big data technologies, and in particular, to a table connection query method, apparatus, and storage medium.
Background
The HBase is a distributed storage System of structured data, is an Apache Hadoop database, is established on a Hadoop Distributed File System (HDFS), is designed to provide a high-reliability, high-performance, column storage, scalable and multi-version NoSql distributed data storage System, realizes real-time and random read-write requests of large data, and is suitable for storing unstructured data.
Although native HBase is suitable for storing massive data, it does not provide a good solution to the problem of join, i.e. join query, of two tables. Apache Hive can support the mainstream SQL language of a traditional relational database, but the query depends on a MapReduce calculation framework, the disk I/O time consumption during data access is large, the query speed is slow even if a table with large data volume and a table with small data volume perform Join operation, and most service scenes filter data from the large table according to the data of the small table, so that the efficiency of the large table Join of HBase is low.
Disclosure of Invention
In view of this, the present disclosure provides a table connection query method, device and storage medium, which are used to solve the technical problem of Join query efficiency of HBase.
Fig. 1 is a flowchart illustrating steps of a table join query method according to an embodiment of the present disclosure, where the method includes:
step 101, connecting the analysis table with a Join query statement to obtain a relational expression of two connection tables in the Join query statement;
step 102, dividing a connection table participating in Join query into a large table and a small table according to a preset size table dividing rule;
103, obtaining the table name, the query field and the filtering condition of the small table from the relational expression of the small table, sending the table name, the query field and the filtering condition to an HBase middleware supporting secondary indexes, and querying the small table through the HBase middleware to obtain query result data, namely first result data, of the small table; the first result data comprises query column names and column values of small tables; when the small table has the filtering condition, the field column in the filtering condition of the small table is an index column;
104, constructing a filtering condition by using the query result data of the small table, and constructing and acquiring a third relational expression of the data of the large table by combining the relational expression of the large table;
105, acquiring the table name, the query field and the filtering condition of the large table from the third relation expression, sending the table name, the query field and the filtering condition to an HBase middleware supporting secondary indexes, and querying the large table through the HBase middleware to obtain query result data of the large table, namely second result data; the second result data comprises a query column name and a column value of the large table; join column of the large table is an index column;
and 106, respectively acquiring records with equal Join column values from the first result data and the second result data, and splicing the acquired records to obtain a Join query result.
Further, the preset size table distinguishing rule is as follows: when both tables have or do not have the filtering condition, distinguishing the size table according to the size of the table storage file, wherein the table storage file with large capacity is the large table, and the table storage file with small capacity is the small table; when only one of the tables has a filtering condition, the table with the filtering condition is used as a small table, and the table without the filtering condition is used as a large table.
Further, when the small table is queried through the HBase middleware to obtain query result data of the small table, i.e. first result data, a batch acquisition mode is adopted, i.e. a first preset number of query result data are obtained from the small table as the first result data each time;
correspondingly, when the large table is queried through the HBase middleware to obtain query result data of the large table, namely second result data, a batch obtaining mode is adopted, namely, a second preset number of query result data are obtained from the large table every time to serve as the second result data;
after splicing the records with the same values of the columns of the first result data and the second result data, storing the records into a result set, after completing acquisition of a batch of large table data through a third relational expression, judging whether the large table data conforming to the third relational expression is acquired, if not, acquiring the next batch of second result data again, splicing the second batch of second result data with the first result data again, and storing the second result data into the result set; if the big table data is obtained, continuously judging whether the small table data which accords with the small table relational expression is obtained, and if the small table data is obtained, outputting the result set as a Join query result; and if the small table data is not obtained completely, continuously obtaining the next batch of small table data, reconstructing a new third relation expression according to the new batch of small table data, and executing the query and subsequent splicing processing steps of the large table data again.
Further, when the total number of records in the small table is smaller than a preset threshold, all small table records which accord with the relational expression of the small table are inquired at one time to serve as first result data, and the first result data are cached in a memory; and constructing a filtering condition based on the first result data, constructing a third relation expression by combining the relation expression of the large table, acquiring the large table data which accord with the third relation expression through HBase middleware in a one-time or batch acquisition mode, and splicing the second result data acquired in batches with the first result data respectively to output a Join query result.
Further, when the Join query includes a limitation on the number of returned records, after the splicing processing step is completed, the method also includes judging whether the number of the records reaches the limitation on the number of the returned records, if so, directly outputting the Join query result, and if not, continuing to execute the Join query result.
Fig. 2 is a schematic structural diagram of a table connection query apparatus provided in an embodiment of the present disclosure, and each functional module in the apparatus 200 may be implemented by software, hardware, or a combination of software and hardware. When a plurality of hardware devices implement the technical solution of the present disclosure together, since the purpose of mutual cooperation among the hardware devices is to achieve the purpose of the present disclosure together, and the action and the processing result of one party determine the execution timing of the action of the other party and the result that can be obtained, it can be considered that the execution main bodies have mutual cooperation relationship, and the execution main bodies have mutual command and control relationship. The apparatus 200 comprises:
the analysis module 201 is configured to analyze a table connection Join query statement to obtain a relational expression of two connection tables in the Join query statement;
a size table distinguishing module 202, configured to distinguish the connection table participating in Join query into a size table and a small table according to a preset size table distinguishing rule; the preset size table distinguishing rule is as follows: when both tables have or do not have the filtering condition, distinguishing the size table according to the size of the table storage file, wherein the table storage file with large capacity is the large table, and the table storage file with small capacity is the small table; when only one of the tables has the filtering condition, taking the table with the filtering condition as a small table and taking the table without the filtering condition as a large table;
the small table data acquisition module 203 is used for acquiring the table name, the queried field and the filtering condition of the small table from the relational expression of the small table, issuing the table name, the queried field and the filtering condition to an HBase middleware supporting secondary indexes, and querying the small table through the HBase middleware to acquire query result data of the small table, namely first result data; the first result data comprises query column names and column values of small tables; when the small table has a filtering condition, fields in the filtering condition of the small table are listed as index columns;
a relational expression construction module 204, configured to construct a filtering condition using query result data of the small table, and construct a third relational expression for obtaining data of the large table in combination with a relational expression of the large table;
a large table data obtaining module 205, configured to obtain a table name, a query field, and a filtering condition of the large table from the third relational expression, send the table name, the query field, and the filtering condition to an HBase middleware supporting the secondary index, and obtain query result data, that is, second result data, of the large table by querying the large table through the HBase middleware; the second result data comprises a query column name and a column value of the large table; join column of the large table is an index column;
and the joint output module 206 is configured to obtain records with equal Join column values from the first result data and the second result data, and output a Join query result after splicing the obtained records.
Further, the small form data acquiring module 203 adopts a batch acquiring manner, that is, a first preset amount of query result data is acquired from the small form as first result data each time;
the large table data obtaining module 205 obtains the query result data in a batch manner, that is, obtains a second preset number of query result data from the large table as second result data each time;
the splicing output module 206 stores the records with the same value of the first result data and the second result data Join column into a result set after splicing, then judges whether the large table data conforming to the third relational expression is obtained or not, and if not, instructs the large table data obtaining module 205 to obtain the next batch of second result data again and performs splicing processing with the first result data again; if the big table data is obtained, continuously judging whether the small table data which accords with the small table relational expression is obtained, and if the small table data is obtained, outputting the result set as a Join query result; if the data of the small table is not obtained, the small table data obtaining module 203 is instructed to continue to obtain the next batch of small table data;
the relational expression constructing module 204 constructs a new third relational expression again according to the new batch of small table data, then instructs the large table data obtaining module 205 to execute again, and after the large table data obtaining module 205 obtains the new batch of large table data, the large table data obtaining module 205 triggers the splicing output module 206 to process again until the Join query result is output.
Further, the small table data obtaining module 203 is further configured to determine whether the total number of records in the small table is smaller than a preset threshold, and when the total number of records in the small table is smaller than the preset threshold, query all small table records that conform to the relational expression of the small table at one time as first result data and cache the first result data in the memory;
and constructing a filtering condition based on the first result data, constructing a third relation expression by combining the relation expression of the large table, acquiring the large table data which accord with the third relation expression through HBase middleware in a one-time or batch acquisition mode, and splicing the second result data acquired in batches with the first result data respectively to output a Join query result.
Further, the Join output module 206 is further configured to, when the Join query includes the limitation of the number of returned records, after the Join processing step is completed, determine whether the number of records reaches the limitation of the number of returned records, directly output the Join query result if the number of records reaches the limitation of the number of returned records, and continue the execution if the number of records does not reach the limitation of the number of returned records.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device 300 includes: a processor 310, such as a Central Processing Unit (CPU), a communication bus 320, a communication interface 340, and a storage medium 330. Wherein the processor 310 and the storage medium 330 may communicate with each other through a communication bus 320. The storage medium 330 stores therein a computer program which, when executed by the processor 310, implements the steps of the table join query method provided by the present disclosure.
The storage medium may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), for example, at least one disk Memory. In addition, the storage medium may be at least one memory device located remotely from the processor. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
According to the scheme provided by the disclosure, in the Join connection query of the big and small tables, the query result of the small table is used as the filtering condition of the big table, the query relational expression of the big table is constructed, and the result of the big table is queried by utilizing the HBase secondary index, so that the query speed of the big and small tables is increased.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present disclosure or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present disclosure.
FIG. 1 is a flowchart illustrating steps of a table join query method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a table connection query apparatus according to an embodiment of the disclosure;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present disclosure. As used in the embodiments of the present disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" as used in this disclosure is meant to encompass any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information in the embodiments of the present disclosure, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of embodiments of the present disclosure. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "at … …" or "in response to a determination.
In order to optimize the connection of the big table and the small table of HBase, namely Join query performance and effectively improve the Join query efficiency of the big table and the small table, the core idea of the table connection query method is to utilize the advantage of high second-level index query speed of HBase, take the query result of the Join column of the small table in two tables for executing the Join as the filtering condition of the big table, query the big table according to the filtering condition to obtain the query result set of the big table, and finally combine the query results of the two tables and return the query result set to a client.
The following SQL statement is an example of Join query, in which a table TableA and a table TableB execute Join, i.e., a Jion query, two column fields for executing Join are respectively a key1 field column of the table TableA and a key2 field column of the table TableB, key1 and key2 are simply called Join columns, and a Where part is a filter condition.
Select<selectlist>
From TableA.A Left Join TableB.B ON A.key1=B.key2
Where B.key2 is NULL
The part < select > is a query column name list, two tables TableA and TableB participating in the Join query are called connection tables, two sides of a keyword "Join" are connection tables, and the keyword "ON" is used for designating the Join columns of the two connection tables.
The following steps of the table connection query method provided in an embodiment of the present disclosure are performed in the following manner, where the method includes:
step 401, analyzing the Join query statement by using a calcite component to obtain a relational expression RelNode of two connection tables in the Join query statement.
In this embodiment, Join Query Language is parsed by using calcite, which is a Structured Query Language (SQL) parsing tool.
The relational expression is a form of calcite representation relational algebra, the relational algebra is procedural query language, SQL statements are abstracted in the calcite to be an interface RelNode, and specific common relational expression forms are TableSacn, Project, Filter and the like.
And step 402, distinguishing the tables participating in Join query into a large table and a small table according to a preset size table distinguishing rule.
In the step, two tables participating in Join query are divided into size tables according to a preset division rule, wherein the division rule is as follows:
a) when both tables have or do not have the filtering condition, distinguishing the size table according to the size of the table storage file, wherein the table storage file with large capacity is the large table, and the table storage file with small capacity is the small table;
b) when only one of the tables has a filtering condition, the table with the filtering condition is set as a small table, and the table without the filtering condition is set as a large table.
And step 403, taking out the relational expression RelNode1 of the small table, obtaining the table name, the query field and the filter condition of the small table from the RelNode1, and sending the table name, the query field and the filter condition to HBase middleware supporting secondary indexes to query the data of the small table.
After the two tables are distinguished into a large table and a small table according to distinguishing rules, the table name, the query field and the filtering condition of the small table are obtained from a relational expression RelNode1 of the small table, the obtained information is sent to an HBase middleware supporting secondary indexes, and the HBase middleware queries data of the small table according to the obtained small table information.
Since HBase middleware supporting a secondary index is required to efficiently perform a query of small table data, a field column in a filter condition of a small table is required to be an index column. In addition, because the Join column of the large table is also issued as a filter condition to the HBase middleware supporting the secondary index for result query, the Join column of the large table also needs to be an index column.
Step 404, inquiring the small table through the HBase middleware to obtain a preset number of inquiry Result data Result 1;
in this step, because the records in the obtained small table query Result data Result1 need to be converted into query filter conditions for large table data, the number of records in Result1 cannot be too large, otherwise the size limit of the filter conditions generated by calcite is exceeded. The predetermined number of steps can be obtained by actual testing, for example, 100 records are usually selected without error.
Step 405, constructing a line expression RexNode according to query Result data Result1 of the small table;
the line expression is calcite, which represents the processing logic for a line of data, including the type of data. Common line expressions are the literal RexLiteral, the variable RexVariable, the function RexCall, and so on.
Step 406, using a RelOptUtil interface of calcite, and taking a row expression RexNode of a small table and a relational expression RelNode2 of a large table as input to obtain a relational expression RelNode3 for inquiring data of the large table;
step 407, obtaining the table name, the query field and the filtering condition of the large table from the relational expression RelNode3, and sending the table name, the query field and the filtering condition to an HBase middleware supporting secondary indexes to query the data of the large table;
step 408, inquiring the large table through the HBase middleware to obtain Result data Result2 with preset quantity;
the preset number in the step can be determined by combining software and hardware configuration of the server and load conditions, and can also be determined by adopting a mode of smaller fixed parameters.
Step 409, according to the Join mode in the Join query, respectively acquiring records with equal Join column values from the small table Result data Result1 and the large table Result data Result2, respectively taking out the records from Result1 and Result2, splicing the records into a row of Join query Result data, and storing the joined List in a set List;
respectively containing query column names and column values of small tables and query column names and column values of large tables in Result1 and Result2, respectively searching records with equal values of the Join columns from Result1 and Result2 according to the names of the Join columns, respectively taking the searched records from Result1 and Result2, splicing into a row of data, and storing the data into a set List, wherein the set List is used for storing the query results of the Join queries.
Step 409, judging whether the size of the set List reaches the number of the inquired pieces, and if so, returning the List result to the client; otherwise, go to step 410;
and setting a Limit parameter of the number of records of the query result data in the Join query statement, if the Limit parameter is set, executing the step, and when the number of records of the returned result data reaches the Limit parameter Limit number, finishing the query and returning the Join query result to the query client. If the Limit parameter is not set in the Join query, step 410 may be performed directly.
Step 410, judging whether the data of the large table is acquired completely, if the data of the large table is not acquired completely, executing step 408, and if the data of the large table is acquired completely, executing step 411;
and 411, judging whether the data of the small table is acquired, if not, executing step 404, and if so, returning a List result of the query result set to the client.
The following describes the SQL execution flow of Join query with an example:
there are two TABLEs, TABLE _ T (teacher TABLE) and TABLE _ C (curriculum TABLE), the primary key of TABLE _ T is T _ ID, the index column of TABLE _ C is NAME, as follows:
TABLE _ T teacher TABLE
T_ID NAME
1 zhangsan
2 lisi
3 wangwu
4 zhaoliu
TABLE _ C curriculum schedule
C_ID NAME T_ID
1 Computer system structure 1
2 Discrete mathematics 3
3 C language 4
4 C language 2
If the teacher teaches the C language courses, the following Join query sentences are required:
select t.T _ ID, t.name from TABLE _ TtJoin TABLE _ ccont.t _ ID c.T _ ID where c.name ═ C language'
Based on the aforementioned discrimination rule of the size TABLE, since field c.name of the filter condition is an index column of TABLE _ C, TABLE _ C is determined as a small TABLE, and TABLE _ T is determined as a large TABLE.
S1, analyzing the Join query statement by using calcite, and converting the SQL statement into a relational expression RelNode;
s2, the relational expression of the small TABLE TABLE _ C is RelNode1, and the relational expression of the large TABLE TABLE _ T is RelNode 2;
s3, firstly, executing query according to RelNode1, acquiescing to obtain 100 results each time, wherein the query result is as follows:
[ (NAME: C language, T _ ID:4), (NAME: C language: T _ ID:2), … ];
s4, taking out the value of the T _ ID in the query result to form a T _ ID set, taking the T _ ID set as a filtering condition, and then constructing a filtering relational expression RelNode3 according to a relational expression RelNode2 of a large table, wherein the relational expression RelNode3 is equivalent to the following SQL conditional query statement:
select t.T_ID,t.t_NAME from TABLE_Ttwhere t.T_ID in(4,2,…);
RelNode3 is used to query out from the large TABLE TABLE _ T all records of T _ ID in set (4,2, …), where set (4,2, …) is the set of all teacher's T _ ID in the query result of the small TABLE relational expression RelNode 1.
S5, executing query according to the RelNode3, acquiescing to sequentially obtain 100 records from the large TABLE TABLE _ T each time, and obtaining the following query result, wherein the query result comprises teacher ID and teacher name of the professor C language:
[(T_ID:4,NAME:zhaoliu),(T_ID:2,NAME:lisi),…];
s6, if the limit of the number of the query records is met, returning a result [ (T _ ID:4, NAME: zhaoliu), (T _ ID:2, NAME: lisi), … ] to the client query client; otherwise, judging whether the RelNode3 can obtain data, if so, continuing to the steps S5-S6, if not, judging whether the RelNode1 can obtain data, if so, continuing to the steps S3-S6, if not, returning a complete query result to the query client, and finishing the query.
In another embodiment of the present disclosure, in order to improve the query efficiency, the relational expression RelNode1 of the small table may be used to query the records of all C-language courses in all the courses tables at a time and cache the query result in the memory, and then a T _ ID set including all teacher IDs is generated at a time according to the query result, so that after the RelNode3 is constructed according to the relational expression RelNode2 of the large table, the result of all Join queries may be queried by using the RelNode3, and there is no need to obtain data of the small table from the small table for multiple times. However, in this processing method, the small table records meeting the conditions need to be cached in the memory, the query efficiency can be improved under the condition that the small table data volume is not large, when the small table data volume is large, more memory space needs to be consumed, and the specific selection of which method is used can be used for judging whether to use the method for obtaining the memory cache once according to a certain decision rule by adding an intelligent judgment program. The decision rule may be to decide according to whether the data amount of the small table exceeds a certain threshold, for example, when the number of records of the small table is less than a preset threshold, the decision is made in a one-time acquisition and caching manner, otherwise, a batch acquisition manner is selected.
According to the scheme provided by the disclosure, in the Join connection query of the big and small tables, the query result of the small table is used as the filtering condition of the big table, the query relational expression of the big table is constructed, and the result of the big table is queried by utilizing the HBase secondary index, so that the query speed of the big and small tables is increased.
It should be recognized that embodiments of the present disclosure can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory memory. The method may be implemented in a computer program using standard programming techniques, including a non-transitory storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose. Further, operations of processes described by the present disclosure may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described in this disclosure (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the disclosure may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network.
The above description is only an example of the present disclosure and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A table join query method, the method comprising:
the analysis table is connected with the Join query statement to obtain a relational expression of two connection tables in the Join query statement;
dividing a connection table participating in Join query into a large table and a small table according to a preset size table dividing rule;
obtaining the table name, the query field and the filtering condition of the small table from the relational expression of the small table, sending the table name, the query field and the filtering condition to an HBase middleware supporting secondary indexes, and querying the small table through the HBase middleware to obtain query result data, namely first result data, of the small table; the first result data comprises query column names and column values of small tables; when the small table has a filtering condition, fields in the filtering condition of the small table are listed as index columns;
constructing a filtering condition by using the query result data of the small table, and constructing and acquiring a third relational expression of the large table data by combining the relational expression of the large table;
obtaining the table name, the query field and the filtering condition of the large table from the third relational expression, sending the table name, the query field and the filtering condition to an HBase middleware supporting secondary indexes, and querying the large table through the HBase middleware to obtain query result data of the large table, namely second result data; the second result data comprises a query column name and a column value of the large table; join column of the large table is an index column;
and respectively acquiring records with equal values of Join columns from the first result data and the second result data, and splicing the acquired records to obtain a Join query result.
2. The method of claim 1,
the preset size table distinguishing rule is as follows: when both tables have or do not have the filtering condition, distinguishing the size table according to the size of the table storage file, wherein the table storage file with large capacity is the large table, and the table storage file with small capacity is the small table; when only one of the tables has a filtering condition, the table with the filtering condition is set as a small table, and the table without the filtering condition is set as a large table.
3. The method according to claim 1, wherein when the HBase middleware queries the small table to obtain the first result data, the query result data of the small table is obtained in a batch manner, that is, each time a first preset number of query result data are obtained from the small table as the first result data;
correspondingly, when the large table is queried through the HBase middleware to obtain query result data of the large table, namely second result data, a batch obtaining mode is also adopted, namely, a second preset number of query result data are obtained from the large table every time to serve as the second result data;
after splicing the records with the same values of the columns of the first result data and the second result data, storing the records into a result set, after completing acquisition of a batch of large table data through a third relational expression, judging whether the large table data conforming to the third relational expression is acquired, if not, acquiring the next batch of second result data again, splicing the second batch of second result data with the first result data again, and storing the second result data into the result set; if the big table data is obtained, continuously judging whether the small table data which accords with the small table relational expression is obtained, and if the small table data is obtained, outputting the result set as a Join query result; and if the small table data is not obtained completely, continuously obtaining the next batch of small table data, reconstructing a new third relation expression according to the new batch of small table data, and executing the query and subsequent splicing processing steps of the large table data again.
4. The method of claim 3,
when the total number of records in the small table is smaller than a preset threshold, inquiring all small table records which accord with the relational expression of the small table at one time to serve as first result data, and caching the first result data in a memory;
and constructing a filtering condition based on the first result data, constructing a third relation expression by combining a relation expression of the large table, acquiring the large table data which accord with the third relation expression through HBase middleware in a one-time or batch acquisition mode, and splicing the second result data acquired in batch with the first result data respectively to output a Join query result.
5. The method of claim 4,
when the Join query includes the limitation of the number of the returned records, after the splicing processing step is completed, the method also includes judging whether the number of the records reaches the limitation of the number of the returned records, if so, directly outputting the Join query result, and if not, continuing to execute.
6. A table join query apparatus, the apparatus comprising:
the analysis module is used for analyzing the table connection Join query statement to obtain a relational expression of two connection tables in the Join query statement;
the big and small table distinguishing module is used for distinguishing the connection table participating in Join query into a big table and a small table according to a preset big and small table distinguishing rule; the preset size table distinguishing rule is as follows: when both tables have or do not have the filtering condition, distinguishing the size table according to the size of the table storage file, wherein the table storage file with large capacity is the large table, and the table storage file with small capacity is the small table; when only one of the tables has the filtering condition, taking the table with the filtering condition as a small table and taking the table without the filtering condition as a large table;
the small table data acquisition module is used for acquiring the table name, the query field and the filtering condition of the small table from the relational expression of the small table, issuing the table name, the query field and the filtering condition to an HBase middleware supporting secondary indexes, and querying the small table through the HBase middleware to acquire query result data of the small table, namely first result data; the first result data comprises query column names and column values of small tables; when the small table has a filtering condition, fields in the filtering condition of the small table are listed as index columns;
the relational expression construction module is used for constructing a filtering condition by using the query result data of the small table and combining the relational expression construction of the large table to obtain a third relational expression of the data of the large table;
the big table data acquisition module is used for acquiring the table name, the query field and the filtering condition of the big table from the third relation expression, sending the table name, the query field and the filtering condition to the HBase middleware supporting the secondary index, and querying the big table through the HBase middleware to acquire query result data of the big table, namely second result data; the second result data comprises a query column name and a column value of the large table; join column of the large table is an index column;
and the joint output module is used for respectively acquiring records with equal Join column values from the first result data and the second result data, and jointing the acquired records and then outputting a Join query result.
7. The apparatus of claim 6,
the small form data acquisition module adopts a batch acquisition mode, namely acquiring a first preset number of query result data from the small form as first result data each time;
the large table data acquisition module adopts a batch acquisition mode, namely, a second preset amount of query result data is acquired from the large table every time and is used as second result data;
the splicing output module splices the records with the same values of the first result data and the second result data Join column and then stores the records into a result set, then judges whether the big table data conforming to a third relational expression is obtained or not, and if not, instructs the big table data obtaining module to obtain the next batch of second result data again and executes splicing processing with the first result data again; if the big table data is obtained, continuously judging whether the small table data which accords with the small table relational expression is obtained, and if the small table data is obtained, outputting the result set as a Join query result; if the data of the small table is not obtained, instructing the small table data obtaining module to continue to obtain the data of the next batch of small tables;
and the relational expression constructing module is used for constructing a new third relational expression again according to the new batch of small table data, then instructing the large table data acquisition module to execute again, and triggering the splicing output module to process again after the large table data acquisition module acquires the new batch of large table data until a Join query result is output.
8. The apparatus of claim 7,
the small table data acquisition module is further used for judging whether the total number of records in the small table is smaller than a preset threshold, and when the total number of records in the small table is smaller than the preset threshold, inquiring all small table records which conform to the relational expression of the small table at one time to serve as first result data and caching the first result data in a memory;
and constructing a filtering condition based on the first result data, constructing a third relation expression by combining a relation expression of the large table, acquiring the large table data which accord with the third relation expression through HBase middleware in a one-time or batch acquisition mode, and splicing the second result data acquired in batch with the first result data respectively to output a Join query result.
9. The apparatus of claim 8,
and the splicing output module is also used for judging whether the number of the records reaches the limit of the number of the returned records or not after the splicing processing step is completed when the Join query includes the limit of the number of the returned records, directly outputting the Join query result if the number of the records reaches the limit of the number of the returned records, and continuously executing if the number of the records does not reach the limit of the number of the returned records.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 5.
CN202110264828.8A 2021-03-11 2021-03-11 Table connection query method, device and storage medium Active CN113190577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110264828.8A CN113190577B (en) 2021-03-11 2021-03-11 Table connection query method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110264828.8A CN113190577B (en) 2021-03-11 2021-03-11 Table connection query method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113190577A CN113190577A (en) 2021-07-30
CN113190577B true CN113190577B (en) 2022-08-30

Family

ID=76973179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110264828.8A Active CN113190577B (en) 2021-03-11 2021-03-11 Table connection query method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113190577B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186651A (en) * 2011-12-31 2013-07-03 中国移动通信集团公司 Distributed relational database as well as method and device for building and querying same
CN104376047A (en) * 2014-10-28 2015-02-25 浪潮电子信息产业股份有限公司 Big table join method based on HBase
CN104794247A (en) * 2015-05-14 2015-07-22 东南大学 Integrated query method for multi-structure database
CN105205162A (en) * 2015-09-29 2015-12-30 烽火通信科技股份有限公司 HBase secondary-index storage and query system and query method thereof
BR102013010416A2 (en) * 2013-04-29 2016-03-22 Univ Fed Ouro Preto optimization of left outer join operation over big data using apache hadoop and apache hbase
CN106294499A (en) * 2015-06-09 2017-01-04 阿里巴巴集团控股有限公司 A kind of database data querying method and equipment
CN106951430A (en) * 2016-12-28 2017-07-14 金蝶软件(中国)有限公司 Account table querying method and device
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch
JP2019204472A (en) * 2018-05-22 2019-11-28 広東技術師範学院 Method for reading plurality of small files of 2 mb or smaller from hdfs having data merge module and hbase cash module on the basis of hadoop
CN112256704A (en) * 2020-10-23 2021-01-22 山东超越数控电子股份有限公司 Quick join method, storage medium and computer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7809713B2 (en) * 2006-03-15 2010-10-05 Oracle International Corporation Efficient search space analysis for join factorization

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186651A (en) * 2011-12-31 2013-07-03 中国移动通信集团公司 Distributed relational database as well as method and device for building and querying same
BR102013010416A2 (en) * 2013-04-29 2016-03-22 Univ Fed Ouro Preto optimization of left outer join operation over big data using apache hadoop and apache hbase
CN104376047A (en) * 2014-10-28 2015-02-25 浪潮电子信息产业股份有限公司 Big table join method based on HBase
CN104794247A (en) * 2015-05-14 2015-07-22 东南大学 Integrated query method for multi-structure database
CN106294499A (en) * 2015-06-09 2017-01-04 阿里巴巴集团控股有限公司 A kind of database data querying method and equipment
CN105205162A (en) * 2015-09-29 2015-12-30 烽火通信科技股份有限公司 HBase secondary-index storage and query system and query method thereof
CN106951430A (en) * 2016-12-28 2017-07-14 金蝶软件(中国)有限公司 Account table querying method and device
JP2019204472A (en) * 2018-05-22 2019-11-28 広東技術師範学院 Method for reading plurality of small files of 2 mb or smaller from hdfs having data merge module and hbase cash module on the basis of hadoop
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch
CN112256704A (en) * 2020-10-23 2021-01-22 山东超越数控电子股份有限公司 Quick join method, storage medium and computer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Search-Based Test Data Generation for SQL Queries;Jeroen Castelein et al.;《IEEE Xplore》;20180903;全文 *
基于分布式系统OceanBase的并行连接;徐石磊 等;《华东师范大学学报(自然科学版)》;20170930(第5期);全文 *
基于索引结构的关系数据库关键词检索;张玉峰;《万方学位论文数据库》;20160504;全文 *

Also Published As

Publication number Publication date
CN113190577A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
US10824622B2 (en) Data statistics in data management systems
JP5298117B2 (en) Data merging in distributed computing
WO2016011904A1 (en) Database access method and apparatus and database system
US8417690B2 (en) Automatically avoiding unconstrained cartesian product joins
CN112988782B (en) Hive-supported interactive query method and device and storage medium
US11003649B2 (en) Index establishment method and device
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN102541631B (en) Execution plans with different driver sources in multiple threads
CN110502532B (en) Method, device, equipment and storage medium for optimizing remote database object
WO2021259217A1 (en) Data association query method and apparatus, and device and storage medium
EP3251030B1 (en) Workload aware data placement for join-based query processing in a cluster
CN104111958A (en) Data query method and device
US11461333B2 (en) Vertical union of feature-based datasets
CN114461603A (en) Multi-source heterogeneous data fusion method and device
CN103605848A (en) Method and device for analyzing paths
CN111258978A (en) Data storage method
US10678789B2 (en) Batch data query method and apparatus
US20170068703A1 (en) Local database cache
CN113918605A (en) Data query method, device, equipment and computer storage medium
CN113190577B (en) Table connection query method, device and storage medium
CN115658680A (en) Data storage method, data query method and related device
CN109117426B (en) Distributed database query method, device, equipment and storage medium
CN111046066B (en) Remote database object optimization method, device, equipment and storage medium
US20200311067A1 (en) Database partition pruning using dependency graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant