CN103995879A - Data query method, device and system based on OLAP system - Google Patents

Data query method, device and system based on OLAP system Download PDF

Info

Publication number
CN103995879A
CN103995879A CN201410228109.0A CN201410228109A CN103995879A CN 103995879 A CN103995879 A CN 103995879A CN 201410228109 A CN201410228109 A CN 201410228109A CN 103995879 A CN103995879 A CN 103995879A
Authority
CN
China
Prior art keywords
data table
data
partition
partitions
ordered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410228109.0A
Other languages
Chinese (zh)
Other versions
CN103995879B (en
Inventor
张宇
范旭
张兵
朱银聪
王方舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410228109.0A priority Critical patent/CN103995879B/en
Publication of CN103995879A publication Critical patent/CN103995879A/en
Application granted granted Critical
Publication of CN103995879B publication Critical patent/CN103995879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data query method, device and system based on an OLAP system. The data query method includes the steps that a data query request sent by a user terminal is received, a first data table corresponding to first data table identification is partitioned logically to obtain at least two first partitions, and a corresponding connection mapping relation is built for each first partition; a second data table corresponding to second data table identification is partitioned logically to obtain at least two second partitions, a first foreign key value is obtained from each second partition, and the first foreign key values are within the boundary value range of the first partitions; the connection mapping relations are queried to obtain a first Hash subtable corresponding to the first foreign key values; the first Hash subtable is scanned to obtain corresponding data, and the data are returned to the user terminal. According to the data query method, through setting up the connection mapping relations, data query time is effectively shortened, and overhead of a server is reduced.

Description

Data query method, device and system based on OLAP system
Technical Field
The embodiment of the invention relates to computer technology, in particular to a data query method, a device and a system based On an On-Line Analytical Processing (OLAP) system.
Background
OLAP is used as a typical application scenario of a database system, and is mainly used for performing query operation on data, joint query of a plurality of data tables is often involved in the query process of the data, association needs to be performed between the tables through connection operation, and the common connection operation is hash connection.
In the prior art, the data query method using hash connection mainly includes: a background server of the database system receives a data query message sent by a user, hash operation is carried out on a main key of a first data table with a small number of lines according to a first data table identifier and a second data table identifier carried in the data query message, a shared hash table is established, parallel scanning is carried out on a second data table, hash operation is carried out on an external key of each query thread to obtain a hash value corresponding to each thread, and parallel scanning is carried out on the shared hash table according to the hash value of each thread to obtain complete data required by the user.
However, since the multi-thread parallel query has write conflict, if there are many parallel threads, the data query time is long, and the overhead of the server is large.
Disclosure of Invention
Embodiments of the present invention provide a data query method, apparatus, and system based on an OLAP system, so as to overcome the problems of long multi-thread query time and high server overhead in the prior art, by partitioning a first data table and a second data table and establishing a connection mapping relationship, in the process of multi-thread query, each thread acquires and scans a first hash sub-table through the connection mapping relationship, thereby acquiring data, effectively shortening data query time, and reducing server overhead.
A first aspect of an embodiment of the present invention provides a data query method based on an OLAP system, including:
receiving a data query request sent by a user terminal, wherein the data query request comprises query information, a first data table identifier and a second data table identifier;
according to the data query request, performing logical partition processing on the first data table corresponding to the first data table identifier to obtain at least two first partitions, and establishing a corresponding connection mapping relationship for each first partition, where the connection mapping relationship includes: the boundary value of the first partition and the corresponding hash sub-table;
partitioning a second data table corresponding to the second data table identifier to obtain at least two second partitions, and obtaining a first foreign key value from each second partition, wherein the first foreign key value is within a boundary value range of one first partition;
inquiring the connection mapping relation to obtain a first Hash sub-table corresponding to the first foreign key value;
and scanning the first Hash sub-table, acquiring data corresponding to the query information, and returning the data to the user terminal.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the performing logical partition processing on the first data table corresponding to the first data table identifier to obtain at least two first partitions, and establishing a connection mapping relationship includes:
partitioning the first data table according to the inherent sequence of the first data table to obtain at least two first partitions;
for each first partition, performing hash operation according to the primary key of the first partition, and establishing a corresponding hash sub-table;
and for each first partition, establishing a corresponding connection mapping relation according to the Hash sub-table corresponding to the first partition.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the performing, by the first data table identifier, a logical partition process on a corresponding first data table to obtain at least two first partitions includes:
if the first data table is ordered, performing logic partition processing on the first data table according to the inherent order of the first data table to obtain at least two first partitions;
or,
and if the first data table is out of order, adding an ordered proxy column in the first data table as a main key column, and performing logic partition processing on the first data table according to the order of the ordered proxy column to obtain at least two first partitions.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the performing partition processing on the second data table corresponding to the second data table identifier to obtain at least two second partitions includes:
and if the first data table is ordered and the second data table is ordered, partitioning the second data table according to the inherent order and the parallel processing capacity of the second data table to obtain at least two second partitions.
Or,
if the first data table is ordered and the second data table is unordered, partitioning the second data table according to the parallel processing capacity to obtain at least two second partitions;
or,
and if the first data table is out of order and the second data table is out of order, replacing the original key value and the foreign key value in the second data table with a new primary key value corresponding to the ordered proxy column in the first data table, and performing partition processing on the replaced second data table according to the parallel processing capacity to obtain at least two second partitions.
With reference to the first aspect and any one of the first to third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the scanning the first hash sub-table to obtain data corresponding to the query information includes:
and scanning the first Hash sub-table, and acquiring all data information in the associated rows in the first data table and the second data table corresponding to the query information as the data.
A second aspect of the embodiments of the present invention provides a data query apparatus based on an OLAP system, including:
the system comprises a receiving and sending module, a sending and receiving module and a sending and receiving module, wherein the receiving and sending module is used for receiving a data query request sent by a user terminal, and the data query request comprises query information, a first data table identifier and a second data table identifier;
a processing module, configured to perform logical partition processing on the first data table corresponding to the first data table identifier according to the data query request, obtain at least two first partitions, and establish a corresponding connection mapping relationship for each first partition, where the connection mapping relationship includes: the boundary value of the first partition and the corresponding hash sub-table;
the processing module is further configured to perform partition processing on a second data table corresponding to the second data table identifier, acquire at least two second partitions, and acquire, for each second partition, a first foreign key value from the second partition, where the first foreign key value is within a boundary value range of one first partition;
the obtaining module is used for inquiring the connection mapping relation and obtaining a first Hash sub-table corresponding to the first foreign key value;
the acquisition module is further configured to scan the first hash sub-table, acquire data corresponding to the query information, and return the data to the user terminal through the transceiver module.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the processing module is specifically configured to:
partitioning the first data table according to the inherent sequence of the first data table to obtain at least two first partitions;
for each first partition, performing hash operation according to the primary key of the first partition, and establishing a corresponding hash sub-table;
and for each first partition, establishing a corresponding connection mapping relation according to the Hash sub-table corresponding to the first partition.
With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the processing module is configured to:
if the first data table is ordered, performing logic partition processing on the first data table according to the inherent order of the first data table to obtain at least two first partitions;
or,
and if the first data table is out of order, adding an ordered proxy column in the first data table as a main key column, and performing logic partition processing on the first data table according to the order of the ordered proxy column to obtain at least two first partitions.
With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the processing module is further configured to:
and if the first data table is ordered and the second data table is ordered, partitioning the second data table according to the inherent order and the parallel processing capacity of the second data table to obtain at least two second partitions.
Or,
if the first data table is ordered and the second data table is unordered, partitioning the second data table according to the parallel processing capacity to obtain at least two second partitions;
or,
and if the first data table is out of order and the second data table is out of order, replacing the original key value and the foreign key value in the second data table with a new primary key value corresponding to the ordered proxy column in the first data table, and performing partition processing on the replaced second data table according to the parallel processing capacity to obtain at least two second partitions.
With reference to the second aspect and any one of the first to third possible implementations of the second aspect, in a fourth possible implementation of the second aspect, the obtaining module is specifically configured to:
and scanning the first Hash sub-table, and acquiring all data information in the associated rows in the first data table and the second data table corresponding to the query information as the data.
A third aspect of the embodiments of the present invention provides a data query system based on an OLAP system, including: the user terminal and the data query device based on the OLAP system provided by the second aspect.
According to the data query method, device and system based on the OLAP system, the query request of the user terminal is received, the first data table is logically partitioned to obtain the first partition, the connection mapping relation is established and represents the corresponding relation between the boundary value of the first partition and the Hash sub-table, the second data table is partitioned to obtain the second partition, each thread obtains and scans the first Hash sub-table through the connection mapping relation in the multithreading query process, data are obtained, and the data are returned to the client terminal.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a first embodiment of a data query method based on an OLAP system according to the present invention;
FIG. 2 is a flowchart of a second embodiment of a data query method based on an OLAP system according to the present invention;
FIG. 3 is a schematic structural diagram of an embodiment of a data query apparatus based on an OLAP system according to the present invention;
FIG. 4 is a schematic structural diagram of an embodiment of a data query system based on an OLAP system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a first embodiment of a data query method based on an OLAP system, which is mainly applied to an OLAP-type database system, and is used in a process of performing joint query on multiple data tables to obtain data, as shown in fig. 1, the method of this embodiment may include:
s101: receiving a data query request sent by a user terminal, wherein the data query request comprises query information, a first data table identifier and a second data table identifier.
In this embodiment, when the user terminal needs to acquire data, a data query request is sent to the database system, where the first data table and the second data table are associated data tables, and store associated data information.
S102: according to the data query request, performing logical partition processing on the first data table corresponding to the first data table identifier to obtain at least two first partitions, and establishing a corresponding connection mapping relationship for each first partition, where the connection mapping relationship includes: the boundary value of the first partition and the corresponding hash sub-table.
In this embodiment, the first data table is divided into at least one first partition, a hash sub-table is obtained for each first partition, and a connection mapping relationship is established to represent a corresponding relationship between a boundary value of each first partition and the hash sub-table of the first partition.
The connection mapping relationship may be represented in a table form, optionally, may be represented in other manners such as mapping, array, set, and the like, and may be selected according to the actual application environment, which is not limited in the present invention.
S103: and partitioning the second data table corresponding to the second data table identifier to obtain at least two second partitions, and obtaining a first foreign key value from the second partitions for each second partition, wherein the first foreign key value is within the boundary value range of one first partition.
In this embodiment, a second data table is obtained according to the second data table identifier, and partition processing is performed on the second data table to obtain at least one second partition, where the number of the second partitions may be the same as or different from the number of the first partitions. And selecting one foreign key value from each second partition to compare with the boundary value of the first partition one by one, and if the foreign key value is within the boundary value range of the first partition, taking the foreign key value as the first foreign key value of the second partition.
In this embodiment, the boundary value range of the first partition is a value between the boundary value of the current first partition and the boundary value of the last first partition.
In the relational database, each data table has several attributes, if one attribute group can uniquely identify the data table, the attribute group is a primary key of the table, for example, the student table includes the study number, name, sex and class, and the study number of each student is unique, the study number is the primary key. The external key is mainly used for being associated with another data table, for example, the score table comprises a school number, a course number and scores, and the school number and the course number can determine the scores together, so the main key of the score table is the school number and the course number, and the school number in the score table corresponds to the school number in the student table, so the school number in the score table is the external key of the student table.
S104: and inquiring the connection mapping relation to obtain a first Hash sub-table corresponding to the first foreign key value.
In this embodiment, the first foreign key value is used to query the connection mapping relationship, and a first hash sub-table is obtained, where the first hash sub-table is a hash sub-table corresponding to a boundary value of a first partition that is the same as the first foreign key value in the plurality of hash sub-tables.
S105: and scanning the first Hash sub-table, acquiring data corresponding to the query information, and returning the data to the user terminal.
In the data query method based on the OLAP system, the first partition is obtained by logically partitioning the first data table by receiving the query request of the user terminal, the connection mapping relation is established and represents the corresponding relation between the boundary value of the first partition and the Hash sub-table, then the second partition is obtained by partitioning the second data table, and each thread obtains and scans the first Hash sub-table through the connection mapping relation in the process of multi-thread query so as to obtain data, and then the data is returned to the client terminal.
Fig. 2 is a flowchart of a second embodiment of the data query method based on the OLAP system in the present invention, and as shown in fig. 2, on the basis of the above embodiment, a specific implementation manner of S102 includes the following steps:
s201: and partitioning the first data table according to the inherent sequence of the first data table to obtain at least two first partitions.
In this embodiment, in the OLAP type database system, the primary keys of the first data table usually have a partial order relationship, that is, there is an inherent order, so that the first data table can be partitioned according to the inherent order, and the first data table is partitioned into at least two first partitions which are not intersected with each other.
S202: and for each first partition, carrying out hash operation according to the primary key of the first partition, and establishing a corresponding hash sub-table.
In this embodiment, according to the divided first partitions, a shared hash sub-table is established for the primary key value lock hash operation of each first partition, and the hash sub-tables corresponding to each first partition are independent of each other.
S203: and for each first partition, establishing a corresponding connection mapping relation according to the Hash sub-table corresponding to the first partition.
In this embodiment, the boundary value of each first partition is obtained, and each first partition has only one boundary value, that is, the first partition is identified by the boundary value, and the corresponding relationship between the boundary value and the hash sub-table corresponding to the first partition is obtained, so as to establish the connection mapping relationship.
In the data query method based on the OLAP system, the first data table is logically partitioned to obtain the first partitions by receiving the query request of the user terminal, the Hash sub-tables corresponding to each first partition and the boundary values of each first partition are obtained, the connection mapping relation is established and represents the corresponding relation between the boundary values of the first partitions and the Hash sub-tables, the second data table is partitioned to obtain the second partition, and each thread obtains and scans the first Hash sub-tables through the connection mapping relation in the process of multi-thread query so as to obtain data and then returns the data to the client terminal.
On the basis of the foregoing embodiment, in particular, in S102, the performing, by the first data table identifier, a logical partition process on the corresponding first data table to obtain at least two first partitions includes the following two implementation manners:
in a first implementation manner, if the first data table is ordered, the first data table is logically partitioned according to the inherent order of the first data table, and at least two first partitions are obtained.
In this embodiment, if the primary key of the first data table has a certain fixed order (partial order relationship), the first data table is logically partitioned in the fixed order.
In a second implementation manner, if the first data table is out of order, an ordered proxy column is added to the first data table as a primary key column, and the first data table is logically partitioned according to the order of the ordered proxy column, so as to obtain at least two first partitions.
In this embodiment, if the first data table does not have a fixed order (partial order relationship), an ordered agent column is added to the first data table, for example, an increasing number sequence 1 is added to the number of rows of the first data table as the ordered agent column, and then the first data table is logically partitioned according to the added ordered agent column.
Further, in S103, the partitioning processing is performed on the second data table corresponding to the second data table identifier to obtain at least two second partitions, and the following three implementation manners are specifically provided:
in a first implementation manner, if the first data table is ordered and the second data table is ordered, partition processing is performed on the second data table according to the inherent order and the parallel processing capability of the second data table, so as to obtain at least two second partitions.
In a second implementation manner, if the first data table is ordered and the second data table is unordered, partitioning the second data table according to parallel processing capability to obtain at least two second partitions;
in a third implementation manner, if the first data table is out of order and the second data table is out of order, the original key value and the foreign key value in the second data table are replaced by a new primary key value corresponding to the ordered proxy column in the first data table, and the replaced second data table is subjected to partition processing according to the parallel processing capacity, so that at least two second partitions are obtained.
In particular toIn the case that the first data table and the second data table have the same inherent sequence (i.e. have the same partial order relationship, which is common in practical applications), for example: "The TPC BenchmarkTMThe two common connection data tables of the lineitem and the order table in the H' have the same inherent sequence with the recording sequence of the lineitem table, so that the order table and the lineitem table can be partitioned by adopting the same boundary value, the logical partitions of the two data tables can be ensured to be in one-to-one correspondence, and the unique hash sub-table can be accessed without connecting a mapping relation in the hash connection process.
Specifically, in S105, the scanning the first hash sub-table to obtain the data corresponding to the query information includes: and scanning the first Hash sub-table, and acquiring all data information in the associated rows in the first data table and the second data table corresponding to the query information as the data.
The technical solution in the above embodiment is specifically described below with reference to an example. Specifically, the names and corresponding sexes and ages of 50 individuals are stored in the first data table, that is, the attributes of the first data table include name, gender and age, the primary key is name, and the first data table has an inherent order in which the initials of the names are arranged in the order of twenty-six letters, that is, rows are arranged in alphabetical order. The name, birth date and mobile phone number of 100 individuals are stored in a second data table associated with the first data table, and the name in the attributes of the second data table is a foreign key of the first data table.
The data query method based on the OLAP system comprises the following main processes: firstly, the first data table is partitioned into five first partitions according to the inherent sequence (initial name sequence) of the first data table, the boundary value of each first partition is the name of a boundary person of the first partition, and hash calculation is performed on each partitioned first partition to obtain a hash sub-table corresponding to each first partition. A connection mapping table is established, in which the boundary value (name, for example, one of the boundary values is zhangsan) of each first partition and the identifier of the corresponding hash sub-table are stored, i.e., the connection mapping table identifies the mapping relationship between the boundary value of each first partition and the corresponding hash sub-table.
Secondly, in the second data table, the foreign key column of the first data table is unordered in general, so that the second data table is not physically partitioned, only logical horizontal partitioning is performed, the second data table is partitioned into three second partitions (three threads with thread number to be processed, or the number of partitions can be also partitioned into the same number as the maximum number of threads that can be processed by the system, which can be specifically selected according to the actual situation, and this is not limited in this application), all foreign key values of the first data table are queried from each second partition, and a name within a boundary value range of a certain first partition is found as the first foreign key value, that is, the name is found: and thirdly, inquiring the connection mapping table according to the first external key value to obtain a corresponding Hash sub-table, then independently scanning the corresponding Hash sub-table for each thread in a mode of the prior art, and obtaining all required data from the first data table and the second data table. Each thread does not interact with each other in the scanning process.
And finally, returning the acquired data to the user terminal to complete the whole data query process.
In this embodiment, the first data table (primary key table) is logically decomposed horizontally into N first partitions (N depends on the parallelism capability that the platform can provide). These first partitions may be non-intersecting due to the inherent order on the primary key, and the respective first partitions may be distinguished by recording boundary values. After the first data tables (main key tables) are subjected to partition processing, each first partition is subjected to parallel scanning by different threads, and the calculation is completed by using a hash function in the scanning process to generate a hash sub-table of each first partition. And recording the mapping relation between the main key boundary value and the hash sub-table by using a connection mapping relation structure. In the process, each thread independently completes the respective hash calculation process, and each thread has no conflict.
The second data table (external key table) in the connection is out of order on the external key column, so that the second data table (external key table) is not physically partitioned, only logically sliced according to the parallelism level, and multi-thread parallel scanning is adopted. For a first foreign key value read by each thread, firstly, a connection mapping relation is inquired, and a hash sub-table corresponding to the first foreign key value is determined through the mapping relation. The multi-thread query of the connection mapping relation is a read-only operation and has no conflict with other operations.
The data query method based on the OLAP system in the embodiment of the invention receives the query request of the user terminal, partitioning the first data table to obtain first partitions, obtaining the hash sub-table corresponding to each first partition and the boundary value of each first partition, establishing a connection mapping relation, the connection mapping relation represents the corresponding relation between the boundary value of the first partition and the Hash sub-table, and then the second data table is partitioned to obtain a second partition, in the process of multi-thread inquiry, each thread acquires and scans a first Hash sub-table through the connection mapping relation, the hash sub-table corresponding to each first foreign key value is independently scanned, so that data is obtained and returned to the client terminal, the problems of long multi-thread query time and high server overhead in the prior art are solved, the data query time is effectively shortened, and the server overhead is reduced.
Fig. 3 is a schematic structural diagram of an embodiment of a data query apparatus based on an OLAP system, as shown in fig. 3, the apparatus of this embodiment may include: the system comprises a transceiver module 31, a processing module 32 and an obtaining module 33, wherein the transceiver module 31 is configured to receive a data query request sent by a user terminal, where the data query request includes query information, a first data table identifier and a second data table identifier; a processing module 32, configured to perform partition processing on the first data table corresponding to the first data table identifier according to the data query request, obtain at least two first partitions, and establish a corresponding connection mapping relationship for each first partition, where the connection mapping relationship includes: the boundary value of the first partition and the corresponding hash sub-table; the processing module 32 is further configured to perform partition processing on a second data table corresponding to the second data table identifier, to obtain at least two second partitions, and to obtain, for each second partition, a first foreign key value from the second partition, where the first foreign key value is the same as a boundary value of one first partition; an obtaining module 33, configured to query the connection mapping relationship, and obtain a first hash sub-table corresponding to the first foreign key value; the obtaining module 33 is further configured to scan the first hash sub-table, obtain data corresponding to the query information, and return the data to the user terminal through the transceiving module 31.
The data query apparatus based on the OLAP system provided in this embodiment may be configured to execute the technical solution of the method embodiment shown in fig. 1, where a transceiver module receives a query request from a user terminal, a processing module logically partitions a first data table to obtain first partitions, obtains hash sub-tables corresponding to each first partition and boundary values of each first partition, and establishes a connection mapping relationship, where the connection mapping relationship represents a correspondence relationship between the boundary values of the first partitions and the hash sub-tables, and then partitions a second data table to obtain a second partition, and during a multi-thread query process, each thread obtains and scans the first hash sub-table through the connection mapping relationship, so as to obtain data, and then returns the data to the client terminal, thereby solving the problems of long multi-thread query time and large server overhead in the prior art, and effectively shortening data query time, server overhead is reduced.
In an embodiment of the data query apparatus based on the OLAP system, based on the above embodiment, the processing module 32 is configured to:
if the first data table is ordered, performing logic partition processing on the first data table according to the inherent order of the first data table to obtain at least two first partitions;
or,
and if the first data table is out of order, adding an ordered proxy column in the first data table as a main key column, and performing logic partition processing on the first data table according to the order of the ordered proxy column to obtain at least two first partitions.
Optionally, the processing module 32 is further configured to:
if the first data table is ordered and a second data table is ordered, partitioning the second data table according to the inherent order and the parallel processing capacity of the second data table to obtain at least two second partitions;
or,
if the first data table is ordered and the second data table is unordered, partitioning the second data table according to the parallel processing capacity to obtain at least two second partitions;
or,
and if the first data table is out of order and the second data table is out of order, replacing the original key value and the foreign key value in the second data table with a new primary key value corresponding to the ordered proxy column in the first data table, and performing partition processing on the replaced second data table according to the parallel processing capacity to obtain at least two second partitions.
Optionally, if the inherent order of the first data table and the second data table is the same, the processing module 32 is configured to: partitioning the second data table according to the same inherent sequence as the first data table to obtain at least two second partitions; wherein the number of the second partitions is the same as the number of the first partitions.
Specifically, the obtaining module 33 is specifically configured to: and scanning the first Hash sub-table, and acquiring all data information in the associated rows in the first data table and the second data table corresponding to the query information as the data.
The data query apparatus based on the OLAP system provided in this embodiment may be used to implement the technical solutions of any one of the first to third embodiments of the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
FIG. 4 is a schematic structural diagram of an embodiment of a data query system based on an OLAP system according to the present invention. As shown in fig. 4, the system includes: the user terminal 41 and the data query device 42 based on the OLAP system according to any one of the embodiments of the apparatus shown in fig. 3. The user terminal 41 is configured to send a data query message to the data query device 42 based on the OLAP system, and is configured to receive data returned by the data query device 42 based on the OLAP system. The data query device 42 based on the OLAP system is used for executing the technical solution of any one of the method embodiments in fig. 1, fig. 2 and the examples, and the implementation principle and the technical effect are similar, and are not described herein again.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. A data query method based on an OLAP system is characterized by comprising the following steps:
receiving a data query request sent by a user terminal, wherein the data query request comprises query information, a first data table identifier and a second data table identifier;
according to the data query request, performing logical partition processing on the first data table corresponding to the first data table identifier to obtain at least two first partitions, and establishing a corresponding connection mapping relationship for each first partition, where the connection mapping relationship includes: the boundary value of the first partition and the corresponding hash sub-table;
partitioning a second data table corresponding to the second data table identifier to obtain at least two second partitions, and obtaining a first foreign key value from each second partition, wherein the first foreign key value is within a boundary value range of one first partition;
inquiring the connection mapping relation to obtain a first Hash sub-table corresponding to the first foreign key value;
and scanning the first Hash sub-table, acquiring data corresponding to the query information, and returning the data to the user terminal.
2. The method according to claim 1, wherein performing logical partition processing on the first data table corresponding to the first data table identifier to obtain at least two first partitions and establish a connection mapping relationship includes:
partitioning the first data table according to the inherent sequence of the first data table to obtain at least two first partitions;
for each first partition, performing hash operation according to the primary key of the first partition, and establishing a corresponding hash sub-table;
and for each first partition, establishing a corresponding connection mapping relation according to the Hash sub-table corresponding to the first partition.
3. The method according to claim 1 or 2, wherein the logical partitioning processing is performed on the first data table identified by the first data table, and acquiring at least two first partitions includes:
if the first data table is ordered, performing logic partition processing on the first data table according to the inherent order of the first data table to obtain at least two first partitions;
or,
and if the first data table is out of order, adding an ordered proxy column in the first data table as a main key column, and performing logic partition processing on the first data table according to the order of the ordered proxy column to obtain at least two first partitions.
4. The method according to claim 3, wherein the partitioning the second data table corresponding to the second data table identifier to obtain at least two second partitions includes:
if the first data table is ordered and a second data table is ordered, partitioning the second data table according to the inherent order and the parallel processing capacity of the second data table to obtain at least two second partitions;
or,
if the first data table is ordered and the second data table is unordered, partitioning the second data table according to the parallel processing capacity to obtain at least two second partitions;
or,
and if the first data table is out of order and the second data table is out of order, replacing the original key value and the foreign key value in the second data table with a new primary key value corresponding to the ordered proxy column in the first data table, and performing partition processing on the replaced second data table according to the parallel processing capacity to obtain at least two second partitions.
5. The method according to any one of claims 1 to 4, wherein the scanning the first hash sub-table to obtain data corresponding to the query information comprises:
and scanning the first Hash sub-table, and acquiring all data information in the associated rows in the first data table and the second data table corresponding to the query information as the data.
6. A data query device based on an OLAP system is characterized by comprising:
the system comprises a receiving and sending module, a sending and receiving module and a sending and receiving module, wherein the receiving and sending module is used for receiving a data query request sent by a user terminal, and the data query request comprises query information, a first data table identifier and a second data table identifier;
a processing module, configured to perform logical partition processing on the first data table corresponding to the first data table identifier according to the data query request, obtain at least two first partitions, and establish a corresponding connection mapping relationship for each first partition, where the connection mapping relationship includes: the boundary value of the first partition and the corresponding hash sub-table;
the processing module is further configured to perform partition processing on a second data table corresponding to the second data table identifier, acquire at least two second partitions, and acquire, for each second partition, a first foreign key value from the second partition, where the first foreign key value is within a boundary value range of one first partition;
the obtaining module is used for inquiring the connection mapping relation and obtaining a first Hash sub-table corresponding to the first foreign key value;
the acquisition module is further configured to scan the first hash sub-table, acquire data corresponding to the query information, and return the data to the user terminal through the transceiver module.
7. The apparatus of claim 6, wherein the processing module is specifically configured to:
partitioning the first data table according to the inherent sequence of the first data table to obtain at least two first partitions;
for each first partition, performing hash operation according to the primary key of the first partition, and establishing a corresponding hash sub-table;
and for each first partition, establishing a corresponding connection mapping relation according to the Hash sub-table corresponding to the first partition.
8. The apparatus of claim 6 or 7, wherein the processing module is configured to:
if the first data table is ordered, performing logic partition processing on the first data table according to the inherent order of the first data table to obtain at least two first partitions;
or,
and if the first data table is out of order, adding an ordered proxy column in the first data table as a main key column, and performing logic partition processing on the first data table according to the order of the ordered proxy column to obtain at least two first partitions.
9. The apparatus of claim 8, wherein the processing module is further configured to:
if the first data table is ordered and a second data table is ordered, partitioning the second data table according to the inherent order and the parallel processing capacity of the second data table to obtain at least two second partitions;
or,
if the first data table is ordered and the second data table is unordered, partitioning the second data table according to the parallel processing capacity to obtain at least two second partitions;
or,
and if the first data table is out of order and the second data table is out of order, replacing the original key value and the foreign key value in the second data table with a new primary key value corresponding to the ordered proxy column in the first data table, and performing partition processing on the replaced second data table according to the parallel processing capacity to obtain at least two second partitions.
10. The apparatus according to any one of claims 6 to 9, wherein the obtaining module is specifically configured to:
and scanning the first Hash sub-table, and acquiring all data information in the associated rows in the first data table and the second data table corresponding to the query information as the data.
11. A data query system based on an OLAP system, comprising: a user terminal and the OLAP system-based data query device of any one of claims 6-10.
CN201410228109.0A 2014-05-27 2014-05-27 Data query method, apparatus and system based on OLAP system Active CN103995879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410228109.0A CN103995879B (en) 2014-05-27 2014-05-27 Data query method, apparatus and system based on OLAP system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410228109.0A CN103995879B (en) 2014-05-27 2014-05-27 Data query method, apparatus and system based on OLAP system

Publications (2)

Publication Number Publication Date
CN103995879A true CN103995879A (en) 2014-08-20
CN103995879B CN103995879B (en) 2017-12-15

Family

ID=51310044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410228109.0A Active CN103995879B (en) 2014-05-27 2014-05-27 Data query method, apparatus and system based on OLAP system

Country Status (1)

Country Link
CN (1) CN103995879B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016165525A1 (en) * 2015-04-16 2016-10-20 华为技术有限公司 Data query method in crossing-partition database, and crossing-partition query device
CN107085570A (en) * 2016-02-14 2017-08-22 华为技术有限公司 Data processing method, application server and router
CN107229692A (en) * 2017-05-19 2017-10-03 哈工大大数据产业有限公司 A kind of distributed multi-table connecting method and system based on streamline
CN107729500A (en) * 2017-10-20 2018-02-23 锐捷网络股份有限公司 A kind of data processing method of on-line analytical processing, device and background devices
WO2018040722A1 (en) * 2016-08-31 2018-03-08 华为技术有限公司 Table data query method and device
CN107818117A (en) * 2016-09-14 2018-03-20 阿里巴巴集团控股有限公司 A kind of method for building up of tables of data, online query method and relevant apparatus
WO2018090557A1 (en) * 2016-11-18 2018-05-24 华为技术有限公司 Method and device for querying data table
CN108427684A (en) * 2017-02-14 2018-08-21 华为技术有限公司 Data query method, apparatus and computing device
CN108874873A (en) * 2018-04-26 2018-11-23 北京空间科技信息研究所 Data query method, apparatus, storage medium and processor
CN108959330A (en) * 2017-05-26 2018-12-07 阿里巴巴集团控股有限公司 A kind of processing of database, data query method and apparatus
CN109189808A (en) * 2018-09-18 2019-01-11 腾讯科技(深圳)有限公司 Data query method and relevant device
CN109582694A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 A kind of method and Related product generating data query script
CN109885574A (en) * 2019-02-22 2019-06-14 广州荔支网络技术有限公司 A kind of data query method and device
CN110083658A (en) * 2019-03-11 2019-08-02 北京达佳互联信息技术有限公司 Method of data synchronization, device, electronic equipment and storage medium
CN110287213A (en) * 2019-07-03 2019-09-27 中通智新(武汉)技术研发有限公司 Data query method, apparatus and system based on OLAP system
CN112597248A (en) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 Big data partition storage method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221510A1 (en) * 2010-03-31 2012-08-30 International Business Machines Corporation Method and system for validating data
CN102663117A (en) * 2012-04-18 2012-09-12 中国人民大学 OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
CN103235793A (en) * 2013-04-01 2013-08-07 华为技术有限公司 On-line data processing method, equipment and system
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN103324724A (en) * 2013-06-26 2013-09-25 华为技术有限公司 Method and device for processing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221510A1 (en) * 2010-03-31 2012-08-30 International Business Machines Corporation Method and system for validating data
CN102663117A (en) * 2012-04-18 2012-09-12 中国人民大学 OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
CN103235793A (en) * 2013-04-01 2013-08-07 华为技术有限公司 On-line data processing method, equipment and system
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN103324724A (en) * 2013-06-26 2013-09-25 华为技术有限公司 Method and device for processing data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱阅岸 等: ""一种基于三元组存储的列式OLAP查询执行引擎"", 《软件学报》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016165525A1 (en) * 2015-04-16 2016-10-20 华为技术有限公司 Data query method in crossing-partition database, and crossing-partition query device
CN106156168A (en) * 2015-04-16 2016-11-23 华为技术有限公司 The method of data is being inquired about and across subregion inquiry unit in partitioned data base
CN106156168B (en) * 2015-04-16 2019-10-22 华为技术有限公司 Across the method and across subregion inquiry unit for inquiring data in partitioned data base
CN107085570A (en) * 2016-02-14 2017-08-22 华为技术有限公司 Data processing method, application server and router
CN107784044B (en) * 2016-08-31 2020-02-14 华为技术有限公司 Table data query method and device
WO2018040722A1 (en) * 2016-08-31 2018-03-08 华为技术有限公司 Table data query method and device
CN107818117A (en) * 2016-09-14 2018-03-20 阿里巴巴集团控股有限公司 A kind of method for building up of tables of data, online query method and relevant apparatus
CN107818117B (en) * 2016-09-14 2022-02-15 阿里巴巴集团控股有限公司 Data table establishing method, online query method and related device
WO2018090557A1 (en) * 2016-11-18 2018-05-24 华为技术有限公司 Method and device for querying data table
CN108073641A (en) * 2016-11-18 2018-05-25 华为技术有限公司 The method and apparatus for inquiring about tables of data
CN108073641B (en) * 2016-11-18 2020-06-16 华为技术有限公司 Method and device for querying data table
CN108427684A (en) * 2017-02-14 2018-08-21 华为技术有限公司 Data query method, apparatus and computing device
CN108427684B (en) * 2017-02-14 2020-12-25 华为技术有限公司 Data query method and device and computing equipment
CN107229692A (en) * 2017-05-19 2017-10-03 哈工大大数据产业有限公司 A kind of distributed multi-table connecting method and system based on streamline
CN107229692B (en) * 2017-05-19 2018-05-01 哈工大大数据产业有限公司 A kind of distributed multi-table connecting method and system based on assembly line
CN108959330A (en) * 2017-05-26 2018-12-07 阿里巴巴集团控股有限公司 A kind of processing of database, data query method and apparatus
CN109582694A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 A kind of method and Related product generating data query script
CN107729500A (en) * 2017-10-20 2018-02-23 锐捷网络股份有限公司 A kind of data processing method of on-line analytical processing, device and background devices
CN108874873A (en) * 2018-04-26 2018-11-23 北京空间科技信息研究所 Data query method, apparatus, storage medium and processor
CN108874873B (en) * 2018-04-26 2022-04-12 北京空间科技信息研究所 Data query method, device, storage medium and processor
CN109189808A (en) * 2018-09-18 2019-01-11 腾讯科技(深圳)有限公司 Data query method and relevant device
CN109885574A (en) * 2019-02-22 2019-06-14 广州荔支网络技术有限公司 A kind of data query method and device
CN110083658A (en) * 2019-03-11 2019-08-02 北京达佳互联信息技术有限公司 Method of data synchronization, device, electronic equipment and storage medium
CN110287213A (en) * 2019-07-03 2019-09-27 中通智新(武汉)技术研发有限公司 Data query method, apparatus and system based on OLAP system
CN110287213B (en) * 2019-07-03 2023-02-17 中通智新(武汉)技术研发有限公司 Data query method, device and system based on OLAP system
CN112597248A (en) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 Big data partition storage method and device
CN112597248B (en) * 2020-12-26 2024-04-12 中国农业银行股份有限公司 Big data partition storage method and device

Also Published As

Publication number Publication date
CN103995879B (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN103995879B (en) Data query method, apparatus and system based on OLAP system
US11132346B2 (en) Information processing method and apparatus
CN103703467B (en) Method and apparatus for storing data
DE112012005533B4 (en) Supporting query and a query
US20200272610A1 (en) Method, apparatus, device and medium for storing and querying data
CN110674154B (en) Spark-based method for inserting, updating and deleting data in Hive
US20180239800A1 (en) Data query method and apparatus
US20160092527A1 (en) Data processing apparatus and data mapping method thereof
US9934324B2 (en) Index structure to accelerate graph traversal
JP6608972B2 (en) Method, device, server, and storage medium for searching for group based on social network
CN105260464B (en) The conversion method and device of data store organisation
CN106407360B (en) Data processing method and device
CN107862047B (en) Natural person data processing method and system based on multiple data sources
CN103810224A (en) Information persistence and query method and device
US9330159B2 (en) Techniques for finding a column with column partitioning
CN109145003B (en) Method and device for constructing knowledge graph
WO2013187816A1 (en) Method and a consistency checker for finding data inconsistencies in a data repository
CN116521956A (en) Graph database query method and device, electronic equipment and storage medium
CN105637489A (en) Asynchronous garbage collection in a distributed database system
CN112905587B (en) Database data management method and device and electronic equipment
CN107169003B (en) Data association method and device
CN109033248B (en) Method and device for storing data record and method and device for inquiring data record
CN110955712A (en) Development API processing method and device based on multiple data sources
US8407255B1 (en) Method and apparatus for exploiting master-detail data relationships to enhance searching operations
CN111400301A (en) Data query method, device and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant