WO2020248604A1 - 一种基于分组向量的哈希多表连接实现方法 - Google Patents
一种基于分组向量的哈希多表连接实现方法 Download PDFInfo
- Publication number
- WO2020248604A1 WO2020248604A1 PCT/CN2020/071885 CN2020071885W WO2020248604A1 WO 2020248604 A1 WO2020248604 A1 WO 2020248604A1 CN 2020071885 W CN2020071885 W CN 2020071885W WO 2020248604 A1 WO2020248604 A1 WO 2020248604A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- grouping
- vector
- connection
- hash
- group
- Prior art date
Links
- 239000013598 vector Substances 0.000 title claims abstract description 374
- 238000000034 method Methods 0.000 title claims abstract description 73
- 230000002776 aggregation Effects 0.000 claims abstract description 136
- 238000004220 aggregation Methods 0.000 claims abstract description 136
- 238000004364 calculation method Methods 0.000 claims abstract description 134
- 238000005192 partition Methods 0.000 claims description 31
- 230000006835 compression Effects 0.000 claims description 23
- 238000007906 compression Methods 0.000 claims description 23
- 238000013507 mapping Methods 0.000 claims description 22
- 238000005259 measurement Methods 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000003860 storage Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24544—Join order optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2264—Multidimensional index structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2308—Concurrency control
- G06F16/2336—Pessimistic concurrency control approaches, e.g. locking or multiple versions without time stamps
- G06F16/2343—Locking methods, e.g. distributed locking or locking implementation details
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24558—Binary matching operations
- G06F16/2456—Join operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Definitions
- the invention relates to a method for realizing hash multi-table connection based on grouping vectors, in particular to a multi-table connection method optimized for data distribution characteristics in OLAP applications.
- Analytical SQL queries mainly include selection, projection, connection, grouping, and aggregation operations.
- the query is usually expressed as the projection of the GROUP-BY attribute on the dimension table according to the given selection criteria, and the connection with the foreign key of the fact table Then perform grouping and aggregation calculations on the specified attributes of the fact table measurement columns.
- the selection and projection operations in the query tree of the traditional relational database are pushed down to the underlying table nodes, and then the connection operations are executed in sequence according to the query tree path, the connection result records are generated iteratively, and the connection result records are hashed and grouped.
- the execution process of the following SQL query command is shown in Figure 1:
- the where clause in the SQL command selects the records that meet the conditions in the customer and supplier tables and projects the GROUP-BY attribute and the corresponding connection attribute, and then creates the corresponding hash table for the connection operation.
- the fact table in the SQL command projects the corresponding attributes and performs the connection operation with the customer table and the supplier table in turn to generate the connection record with the GROUP-BY grouping attribute and the aggregation attribute, and finally the aggregation calculation is performed through the hash grouping operation.
- the fact table records are sequentially connected to the hash table generated by the customer table.
- the records that meet the first connection condition are then connected to the hash table generated by the supplier table. Only the records that meet all the connection conditions participate in the final hash.
- Group aggregation calculation In the connection phase, only the foreign key attributes of the fact table participate in the calculation.
- the grouping and metric attributes are transmitted between the query trees but the records are discarded when the connection conditions are not met, resulting in invalid data transmission costs. The greater the number of connection tables and the lower the selection rate, the greater the cost of invalid data transmission.
- the hash grouping on the GROUP-BY attribute due to the one-to-many relationship between the dimension table and the fact table record, the hash grouping on the GROUP-BY attribute generates a large number of repeated hash calculation costs.
- the GROUP-BY attribute is relatively high
- the hash calculation cost is higher for long character data.
- the purpose of the present invention is to provide a method for implementing hash multi-table connection based on grouping vector.
- the connection and aggregation calculation are divided into two independent calculation stages through the post-materialization technology based on vector index.
- a vector index supporting group vector aggregation calculation is created in the connection stage, and asynchronous pipeline processing between large-granularity connection and aggregation computing load is realized through vector index.
- a method for implementing hash multi-table connection based on packet vector which includes the following steps:
- the selection-projection-grouping-connection operation subtask is used to create the grouping vector corresponding to the GROUP-BY statement in the generated SQL query command, and the vector index is created as the output result of the selection-projection-grouping-connection operation subtask;
- the aggregation operation subtask performs aggregation calculation based on vector index, and stores the aggregation calculation result in the unit corresponding to the grouping vector aggregator with the same length as the grouping vector;
- the selection, projection, and grouping operations are applied to the dimension table to create the dimension table hash table of the connection operation;
- the connection operation Perform a multi-table join operation between the foreign key column of the fact table and the hash table of the dimension table, and generate a vector index as the output result of the join operation;
- the aggregation operation subtask executes the fact table measurement column based on the vector index The aggregation calculation task is obtained, and the aggregation calculation result is obtained.
- the vector index is a vector data structure with the same length as the group aggregation calculation table, and the empty value unit in the vector index indicates that the record in the table corresponding to the unit does not participate in the group aggregation calculation.
- a non-empty unit indicates that the record in the table corresponding to the unit participates in the group aggregation calculation;
- the unit value of the vector index refers to the grouping vector, and the grouping vector corresponds to the grouping attribute set generated by the GROUP-BY sentence, and the grouping attribute
- Each grouping attribute member in the set corresponds to a grouping vector unit, the grouping attribute member ID is set to the subscript address of the grouping vector, and the corresponding grouping vector unit address is recorded in the non-empty unit of the vector index.
- the unit value of the index is directly mapped to the unit corresponding to the grouping vector for aggregation calculation.
- the connection operation for creating the vector index includes three implementation methods: a hash pre-grouping connection method, a connection method based on multi-dimensional grouping mapping, and a connection method based on sparse multi-dimensional array compression.
- the hash pre-grouping connection method includes the following steps: after the hash multi-table connection operation is performed, a hash table is created for the GROUP-BY grouping attribute of the generated connection record, and the hash grouping operation is performed; creating a global Sequence generator, starting from 0 with a step size of 1, assigning a unique grouping vector ID to the vector index; performing hash detection on the GROUP-BY grouping attribute of each generated connection record, and creating a vector based on the hash detection result Index, specific: if the same GROUP-BY group attribute value is not detected in the hash table, create a hash record, obtain the current sequence generator value from the global sequence generator as the ID of the current group attribute value, and then sequence The generator value is increased by 1, and the grouping attribute value is written into the corresponding unit of the grouping vector corresponding to the current grouping attribute value ID, and the grouping attribute value ID is written into the unit of the corresponding position in the vector index, or the location ID is recorded And the group attribute value
- connection method based on multi-dimensional grouping mapping includes the following steps: after selecting, projecting, and grouping dimension tables, performing dynamic dictionary table compression on the projected group attribute values, and mapping the grouping attributes into a dictionary vector ,
- the dictionary vector subscript is used as the grouping attribute ID on the dimension table.
- each grouping attribute combination value is mapped to a vector unit, and the primary key value of the dimension record and the grouping ID are obtained Dimension table hash table; perform the multi-table join operation between the foreign key column of the fact table and the dimension table hash table to obtain multiple dimension table group ID combination values, and map multiple dimension table group IDs into a multi-dimensional array.
- the group ID of a dimension table represents a subscript on one dimension of a multi-dimensional array, and the result of multi-table connection corresponds to a multi-dimensional array address; the multi-dimensional array is further mapped to a one-dimensional vector, and the multi-dimensional array address is converted to a one-dimensional vector subscript to obtain the vector index. Record the connection result in the vector index.
- connection method based on sparse multi-dimensional array compression includes the following steps: after selecting, projecting, and grouping dimension tables, performing dynamic dictionary table compression on the projected grouping attribute values, and mapping the grouping attributes to a dictionary Vector, dictionary vector subscript is used as the grouping attribute ID on the dimension table.
- dictionary vector subscript is used as the grouping attribute ID on the dimension table.
- the dimension table hash table perform the multi-table join operation between the foreign key column of the fact table and the dimension table hash table to obtain the combination value of multiple dimension table group IDs, and map the multiple dimension table group IDs to a multi-dimensional array,
- Each dimension table grouping ID represents a subscript on one dimension of a multi-dimensional array, and the multi-table connection result corresponds to a multi-dimensional array address;
- the multi-dimensional array generated in the connection stage is mapped to a one-dimensional sparse grouping vector, and the one-dimensional sparse
- the grouping vector creates a dense grouping vector, that is, a sequence generator is used to assign a unique sequence to each non-empty vector unit during mapping; finally, a vector index is created based on the dense grouping vector for the aggregation calculation operation on the measurement data column of the fact table .
- a multi-core parallel aggregation calculation method based on vector index is adopted, and the specific implementation method is: if the vector index is a fixed-length vector, the vector index partition is logically divided according to the number of threads, and the same position logic Divide the fact data partition, each thread scans the vector index partition in parallel and executes the vector aggregation operation on the corresponding fact data partition; if the vector index is in a compressed format, the method of balancing the logical partitions in the compressed vector index is adopted, and the compressed vector index is used The partition scans the fact table data partition for aggregation calculation, which simplifies the location calculation of the logical partition and balances the aggregation calculation cost on the logical partition corresponding to each thread.
- each thread uses a private group vector to complete the group aggregation calculation operation on the logical fact data partition of the thread. After each thread is processed, the inter-thread execution is executed.
- the merging calculation of the grouping vector generates the final global grouping vector; the second is that multi-threads share a unified grouping vector, and perform global grouping aggregation calculation based on the concurrency control mechanism, and when the size of the grouping vector exceeds the optimal threshold, the global grouping is used
- the aggregation calculation method when the size of the packet vector does not exceed the optimal threshold, the private packet vector aggregation calculation method is adopted, where the optimal threshold is 75% of the size of the last-level cache unit of each core of the CPU thread.
- the method of combining the aggregation calculation result in the grouping vector aggregator with the grouping vector metadata created by the connection operation subtask is: when the hash pre-grouping connection method is adopted: grouping the vector The address of the processor unit is mapped to the group vector in the connection phase to obtain the GROUP-BY attribute value mapped by the address, and combined with the aggregation calculation result stored in the group vector aggregator, as the output result of the SQL query command;
- the address of the grouping vector aggregator unit is mapped to the address of a multi-dimensional array, and then the corresponding grouping vector of the dimension table is respectively accessed and the corresponding GROUP-BY attribute value is parsed, and then combined into a SQL query The output of the command;
- the non-empty unit of the grouping vector is converted into a multi-dimensional array subscript according to the vector address, and each subscript value is mapped to the dictionary table vector of each dimension table, and each GROUP-BY attribute value is accessed. Access the aggregation calculation result in the group vector aggregator unit according to the address value of the group vector aggregator stored in the group vector unit, and generate the output result of the SQL query command.
- the present invention has the following advantages: 1. In OLAP applications, the amount of measurement data in the fact table is extremely large, and the size of the connection foreign key column is determined by the mode, and the amount of data is relatively small.
- the present invention divides the connection and aggregation calculation into two independent calculation processes through the vector index.
- the huge fact table measurement data can be stored separately and the aggregation calculation is completed based on the vector index.
- the specific semantics of the query cannot be obtained in the aggregation calculation stage. It is not possible to perform semantic analysis on aggregate calculation results, so that the storage (non-semantic numerical data) and calculation (non-semantic numerical calculation) of fact table measurement data have higher security and reduce the security risks of enterprise data information. 2.
- the present invention improves the independence of aggregation calculation by decoupling connection operation and aggregation calculation operation.
- connection operation and aggregation calculation operation can be optimally distributed on different storage and computing platforms to make the operation more efficient.
- the characteristics and the calculated characteristics are optimally matched. Therefore, the present invention can be widely used in the field of OLAP data query.
- Figure 1 is the execution process of SQL query commands in the traditional iterative pipeline query processing model
- Figure 2 is a schematic diagram of the data structure of a vector index
- Figure 3 is an implementation case of the hash pre-grouping connection method of the present invention.
- Figure 4 is an example of the connection method based on multi-dimensional packet mapping of the present invention.
- Figure 5 is an application example of the connection method based on sparse multi-dimensional array compression of the present invention.
- FIG. 6 is a schematic diagram of the present invention using a private group vector method to complete a global group aggregation calculation operation
- FIG. 7 is a schematic diagram of the present invention adopting the method of multi-thread sharing and unified grouping vector to complete the global grouping aggregation calculation operation.
- the present invention finds that improving query performance can be optimized from the following aspects:
- connection stage is the fact table record ID and grouping vector unit address that meets the connection conditions, which realizes the pre-grouping operation in the connection stage, so that in the aggregation calculation stage, it is possible to efficiently access the fact table measurement attributes according to the fact table record ID position and directly Perform aggregation calculations based on grouping vectors;
- connection and aggregation computing is a materialized vector index, which eliminates the strong association relationship between records in traditional query processing, thereby supporting the distribution of connection and aggregation computing on different computing platforms, and supports optimization for data size and computing characteristics Distribution strategy.
- Pre-grouping is completed in the multi-table connection stage by decoupling the grouping and aggregation operations in OLAP query processing.
- the grouping is stored in a vector structure.
- the result of the pre-grouping in the connection operation is stored in the vector index.
- the non-empty unit in the vector index is The subscript address of the group corresponding to the connection record in the grouping vector, the huge fact table can directly perform aggregation calculation based on the vector index, the aggregation calculation result is stored in the grouping vector aggregator, and finally the OLAP query result set is generated through vector analysis.
- the present invention optimizes the OLAP query processing performance on the one hand, and on the other hand makes the multi-table join operation and the aggregation calculation into two independent calculation processes, so that the join operation is based on the calculation-intensive load on the smaller join column, while the aggregation calculation is It becomes a data-intensive load based on a larger data set, making it easier for loads with two different storage and computing characteristics to be optimally configured on heterogeneous processors with different storage capacities and computing capabilities.
- the present invention provides a method for implementing hash multi-table connection based on packet vector, which includes the following steps:
- connection operation selection-projection-grouping-connection
- connection operation subtask is used to create the grouping vector corresponding to the GROUP-BY statement in the generated SQL query command, and create the vector index as the output result of the connection operation subtask;
- the aggregation operation subtask performs aggregation calculation based on vector index, and stores the aggregation calculation result in the unit corresponding to the grouping vector aggregator with the same length as the grouping vector;
- step 1) the "select, project, group” operation is applied to the dimension table, and the hash table structure of the "join” operation stage is created, and the "join” operation performs the difference between the foreign key column of the fact table and the dimension table hash table.
- Multi-table join operation between generate vector index as the output result of join operation subtask;
- aggregation operation subtask executes the aggregation calculation task based on vector index on the measurement column of fact table, that is, execute select VecInx, sum(%)from FactTable FT group by VecInx; type of aggregation calculation, where VecInx represents the vector index column created by the subtasks of the "select, project, group, and connect” operations.
- the vector index is an index structure used for group aggregation calculation.
- the vector index is a vector data structure with the same length as the group aggregation calculation table (such as lineorder).
- the empty value unit in the vector index indicates that the record in the table corresponding to the unit does not participate in the group aggregation calculation, and the non-empty unit is Indicates that the records in the table corresponding to the unit participate in the grouping aggregation calculation;
- the vector index unit value refers to the grouping vector, the grouping vector corresponds to the grouping attribute set generated by the GROUP-BY statement, and each grouping attribute member in the grouping attribute set corresponds to a grouping vector unit ,
- the ID value of the grouping vector unit is set to the grouping vector subscript address, and the corresponding grouping vector unit address (ie the ID value of the grouping vector unit) is recorded in the non-empty unit of the vector index, and the value of the vector index unit is directly mapped to the corresponding grouping vector.
- a compressed vector index structure can be adopted, that is, a fixed-length vector index is converted into a variable-length [FID, GID] two-tuple sequence, and FID corresponds to the record position of the grouping aggregation calculation table ID and GID are grouping vector IDs, which eliminate the storage and access costs caused by too many null units in the vector index through compressed storage.
- connection operation of generating the vector index can adopt three implementation methods:
- create a hash record Obtain the ID value of the current global sequence generator from the global sequence generator as the ID of the current group value, increase the value of the sequence generator by 1, and write the group attribute value into the corresponding unit of the group vector corresponding to the current group attribute value ID At the same time, write the group attribute value ID into the unit of the corresponding position in the vector index, or append the fact record location ID and the group attribute value ID to the compressed vector index; when the connection record is found in the hash detection, the same GROUP- When BY grouping attributes, store the ID corresponding to the grouping attribute value of the hash record in the unit corresponding to the vector index or add a fact record location ID and the grouping attribute value ID to the two-tuple record in the compressed vector index; where When obtaining the current value of the sequence generator, it is necessary to use the latch to lock the sequence value. After the value is increased by 1, the latch lock is released to ensure the uniqueness of the sequence assignment.
- This method decomposes the traditional join operation into two stages of join grouping and aggregation calculation.
- a grouping vector is created for the GROUP-BY attribute, and a vector index is created for the aggregation calculation, so that subsequent aggregation calculations can be executed independently.
- the aggregation calculation based on the grouping vector ID shields the group attribute semantics of the query, so that the aggregation calculation performed on the fact table eliminates the query semantics, which is beneficial to the protection of the user's query semantics.
- the lineorder table projects the L_CK and L_SK columns to perform query processing.
- the connection result with the lineorder record ID is generated, the grouping vector is created for the GROUP-BY attribute value in the connection result, and the GROUP-BY
- the attribute value is mapped to an array grouping vector.
- Create a global sequence generator which is a sequence with a step length of 1 increasing as the query group ID. The initial value is 0.
- the latch data structure is used to lock when reading, and the current value is assigned to the applied hash record. Then the sequence value is increased by 1 and the lock is released to ensure that each sequence value is uniquely assigned.
- a hash table is created for the GROUP-BY attribute.
- the current sequence value is obtained from the global sequence generator and written into the hash record.
- the sequence generator value 0 is obtained and written into the hash table
- the GROUP-BY attribute value (Brazil, Japan) is written into the grouping vector unit corresponding to the current sequence value [0] in.
- the grouping vector unit address [0] is recorded in the vector index of the same length as the fact table
- the vector index subscript corresponds to the fact record ID
- the vector index unit records the subscript value of the GROUP-BY attribute in the grouping vector [0] .
- the vector index adopts a compressed format
- the fact record ID and group ID [0,0] of the non-empty vector index unit are recorded through the two-tuple [FID, GID].
- the present invention further pushes down the pre-grouping from the connection end to the dimension table access stage, and creates a grouping vector based on the multi-dimensional array.
- the grouped attribute values projected after filtering are compressed by the dynamic dictionary table, and the grouped attributes are mapped to a dictionary vector, dictionary vector
- the subscript is used as the grouping attribute ID on the dimension table.
- each grouping attribute combination value is mapped to a vector unit.
- the dimension table hash table stores the primary key value of the dimension record and the group attribute ID.
- the connection result is the combined value of multiple dimension table grouping attribute ID, and the multiple dimension table grouping attribute ID combination value is mapped to a multi-dimensional array.
- Each dimension table grouping attribute ID represents the subscript on one dimension of the multi-dimensional array, and the result of multi-table connection Corresponds to a multi-dimensional array address.
- the multi-dimensional array is further mapped to a one-dimensional vector, and the address of the multi-dimensional array is converted into a one-dimensional vector index to obtain the vector index, and the connection result is recorded in the vector index.
- the multi-dimensional array mapping mechanism is suitable for situations where the grouping attribute of the dimension table is small, the corresponding multi-dimensional array is small, or the usage rate of the multi-dimensional array unit is high.
- the mapped multi-dimensional array is large (the size of the multi-dimensional array is higher than the thread private cache size in the CPU) )
- the number of units actually used in the query is lower than a certain threshold, such as 1%
- the utilization rate of the grouping vector of the multidimensional array mapping is low and the aggregation calculation performance is reduced.
- FIG. 4 an example of a connection method based on multi-dimensional packet mapping is shown. Filter out the primary key column c_custkey and grouping column c_nation of the records in the Customer table, and perform dictionary table compression on the grouping attribute column c_nation in the result set. Different grouping values are stored in the dictionary vector, and the subscript of the dictionary vector is used as the compression of the grouping value. coding. Similarly, the grouping attribute s_nation filtered and projected by the supplier table also creates a dictionary vector, and uses the dictionary vector subscript as the compression code of the s_nation attribute value.
- the Customer table and the supplier table create a hash table based on the grouping attribute compression code, and perform the connection operation with the lineorder foreign key column.
- the connection result is c_nation, s_nation compression code and lineorder record ID.
- the c_nation and s_nation dictionary vectors are mapped to a two-dimensional array.
- the compression coding values of c_nation and s_nation in the connection result correspond to the subscripts on the two dimensions of the two-dimensional array. Map the connection result to a two-dimensional array, store the subscript of the one-dimensional array converted from the two-dimensional array in the unit corresponding to the lineorder record in the vector index, and store the result of each connection record in the vector index.
- the connection result is The empty record is set to a null value in the corresponding unit in the vector index.
- the vector index uses the compressed format, that is, the connection result is stored as a [FID, GID] two-tuple, and the lineorder table ID corresponding to the non-empty connection result and the mapped two-dimensional array address are stored in the compressed vector Index.
- the aggregation calculation is completed on the measurement column, and the measurement column unit corresponding to each non-empty vector index unit is mapped to the grouped vector unit corresponding to the two-dimensional array address stored in the vector index unit for aggregation calculation.
- the final grouping vector stores the query results.
- the non-empty cell vector address is converted into a two-dimensional array subscript format.
- Each dimension array subscript is mapped to the corresponding cell of the corresponding dimension table dictionary vector subscript to read the grouping attribute value, and generate the final search result.
- the grouping vector unit [2] corresponds to the multi-dimensional array address [0, 2]
- the first dimension address 0 corresponds to the value Canada in the customer table dictionary vector unit [0]
- the second dimension address 2 corresponds to the supplier table dictionary vector unit [2]
- the value Korea in the grouping vector list [2] is combined with the value 626 stored in the grouping vector list [2] to form the connection result Canada, Korea, 626, and so on, all non-empty units in the grouping vector are converted and mapped to generate the connection result.
- connection stage a multi-dimensional array is generated and mapped to a one-dimensional sparse grouping vector.
- the specific method is to use a sequence generator to assign a unique sequence to each non-empty mapping vector unit during mapping.
- the maximum sequence ID value is the maximum grouping value corresponding to the GROUP-BY attribute, and the maximum grouping value length is created.
- Grouping vector aggregator Then create a vector index based on the dense grouping vector, which is used to measure the aggregation calculation operation on the data column.
- the grouping vector aggregator is used for the aggregation calculation on the measurement attribute of the fact table, and the measurement attribute is mapped to the corresponding unit of the grouping vector aggregator for aggregation calculation according to the value of the vector index non-empty unit or the value of the compressed vector index GID column.
- the non-empty unit of the grouping vector After performing the aggregation calculation task, convert the non-empty unit of the grouping vector into a multi-dimensional array subscript according to the vector address, map each subscript value to the dictionary table vector of each dimension table, access each GROUP-BY attribute value, and store it according to the grouping vector unit
- the address value of the grouping vector aggregator accesses the aggregation calculation result in the grouping vector aggregator unit to generate a grouping aggregation result set.
- This method uses the multi-dimensional array mapping method in the connection stage to optimize the hash grouping method, while the sparse multi-dimensional grouping is mapped to the smallest grouping vector aggregator, which optimizes the grouping mapping performance in the aggregation calculation stage.
- connection result record [0](1,0) corresponds to the two-dimensional array address [1,0]
- the one-dimensional grouping vector corresponds to the unit address [3]
- the current grouping vector unit [3] If it is empty, get the current value 0 from the sequence generator and store it in the grouping vector unit [3], and record the value 0 in the grouping vector [3] in the vector index unit [0].
- the value in the grouping vector unit is written into the corresponding vector index unit; that is, when the grouping vector unit corresponding to the connection is empty, from The sequence value obtained in the sequence generator is stored in the grouping vector unit, and the sequence value is written into the vector index corresponding unit.
- the connection selection rate is low, the compressed vector index can be directly generated, and the vector index or the compressed vector index is generated after the connection operation is completed.
- the aggregation calculation is completed by the grouping vector aggregator based on the vector index or the compressed vector index. For example, the value stored in the vector index unit [2] is 1, access the metric column [2] unit, and map the metric value 626 to the grouping vector aggregator unit [1] for accumulation calculation. After the aggregation calculation is completed, each unit of the grouping vector aggregator accesses the grouping attribute value through the two-dimensional address mapping to generate the final query result. Map the non-empty unit address in the grouping vector to a two-dimensional array address.
- the grouping vector unit [2] is mapped to [0,2], the first dimension subscript 0 is mapped to the customer table dictionary vector to obtain Canada, and the second dimension is under Mark 2 is mapped to the supplier table dictionary vector to obtain Korea, and the grouping vector unit value 1 is mapped to the grouping vector aggregator unit [1] to obtain 626, which is merged into the query result record Canada, Korea, 626.
- the final query result set is generated after scanning and processing the non-empty cells of the grouping vector.
- step 3 a multi-core parallel aggregation calculation method based on vector index is adopted.
- the basic implementation method of multi-core parallel aggregation operation based on vector index If the vector index is a fixed-length vector, the vector index partition is logically divided according to the number of threads, and the fact data partition is logically divided according to the same position, and each thread scans the vector index partition in parallel And perform the vector aggregation operation on the corresponding fact data partition; if the vector index is in a compressed format, the same logical partition method as the fixed-length vector index is adopted.
- the present invention adopts the strategy of balancing logical partitions on the compressed vector index to simplify the location of the logical partitions Calculate and make the aggregate calculation cost of each thread corresponding to the logical partition more balanced.
- two aggregation calculation methods can be used: one is that each thread uses a private grouping vector to complete the grouping calculation operation on the logical fact data partition of the thread, and the inter-thread grouping vector is executed after each thread is processed.
- the merge calculation of generates the final global grouping vector; the second is that multi-threads share a unified grouping vector and perform global grouping aggregation calculations based on the concurrency control mechanism.
- the private grouping vector method maintains a private grouping vector for each thread to complete the aggregation calculation on the data shards of the thread. Finally, it is necessary to merge the grouping vectors of each thread to obtain the global grouping aggregation calculation result.
- the size of the grouping vector is smaller than the CPU thread private cache (L1cache, L2cache, L3cache slice), the grouping aggregation calculation has higher data locality, and the private grouping vector improves the efficiency of parallel computing.
- the shared grouping vector method maintains a globally unified grouping vector for each thread, and each thread concurrently accesses the global grouping vector unit for global aggregation calculations.
- Concurrent control mechanism is required to ensure that the cumulative value of the grouping vector unit is updated concurrently. The correctness of concurrent data updates.
- the optimal threshold is that the size of the packet vector does not exceed 75% of the slice size of L3cache, and the private packet vector will produce higher cache misses. At this time, the use of shared packet vectors will have better Calculation efficiency.
- the result sets of the grouping and aggregation operations are stored in the connection and aggregation calculation stages respectively.
- the grouping metadata is created in the connection operation, such as the grouping vector, the dimension table grouping dictionary vector, etc., and the specific values of the grouping attributes are recorded ;
- the grouping vector aggregator without semantics is used in the aggregation calculation.
- the aggregation calculation result is stored in the grouping vector aggregator, and the semantics of the aggregation calculation cannot be analyzed.
- the present invention uses vector index as an intermediary between the two calculation stages of connection and aggregation.
- the vector index supports the optimal aggregation calculation, and the connection stage produces the optimal vector index structure.
- the present invention can be applied to heterogeneous computing platforms.
- the connection stage with high computational cost is deployed on high-performance hardware acceleration platforms such as GPU, FPGA, Phi, etc., and huge fact table data is stored in a highly scalable distributed storage system.
- the hardware accelerator platform generates a vector index after performing high-performance connection operations.
- the vector index is sent to the distributed storage system data shards to complete the localized aggregation calculation and aggregation result merging tasks based on the vector index.
- the global aggregation and merging calculation is completed to the master node
- the aggregation calculation result is returned, and the master node combines the grouping vector of the hardware accelerator platform and the grouping vector aggregator result returned by the distributed storage system into a query output result set.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Operations Research (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims (10)
- 一种基于分组向量的哈希多表连接实现方法,其特征在于包括以下步骤:1)对SQL查询命令进行改写,将一个完整的SQL查询命令划分为选择-投影-分组-连接操作和聚集操作两个子任务;2)选择-投影-分组-连接操作子任务用于创建生成SQL查询命令中GROUP-BY语句对应的分组向量,并创建向量索引作为选择-投影-分组-连接操作子任务的输出结果;3)聚集操作子任务执行基于向量索引的聚集计算,并将聚集计算结果存储在与分组向量等长的分组向量聚集器对应的单元中;4)将分组向量聚集器中的聚集计算结果与步骤2)创建的分组向量合并,作为SQL查询命令的结果集输出。
- 如权利要求1所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述步骤1)中,所述选择-投影-分组-连接操作子任务中,选择、投影、分组操作作用在维表上,用于创建连接操作的维表哈希表;连接操作执行事实表外键列与所述维表哈希表之间的多表连接操作,并生成向量索引作为连接操作的输出结果;所述聚集操作子任务执行事实表度量列上基于所述向量索引的聚集计算任务,得到聚集计算结果。
- 如权利要求2所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述步骤2)中,所述向量索引是一个与分组聚集计算表等长的向量数据结构,所述向量索引中空值单元表示该单元对应的表中的记录不参与分组聚集计算,非空单元则表示该单元对应的表中的记录参与分组聚集计算;所述向量索引的单元值参照所述分组向量,所述分组向量对应GROUP-BY语句产生的分组属性集合,所述分组属性集合中的每个分组属性成员对应一个分组向量单元,分组属性成员ID设置为所述分组向量的下标地址,所述向量索引的非空单元中记录对应的分组向量单元地址,通过所述向量索引的单元值直接映射到分组向量对应的单元进行聚集计算。
- 如权利要求3所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述步骤2)中,创建所述向量索引的连接操作包括三种实现方法:哈希预分组连接方法、基于多维分组映射的连接方法以及基于稀疏多维数组压缩的连接方法。
- 如权利要求4所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述哈希预分组连接方法包括以下步骤:执行哈希多表连接操作后,为生成的连接记录的GROUP-BY分组属性创建哈希表,并执行哈希分组操作;创建一个全局序列生成器,从0开始以步长为1递增,为向量索引分配唯一的分 组向量ID;对每一个生成的连接记录的GROUP-BY分组属性进行哈希探测,并根据哈希探测结果创建向量索引,具体的:如果哈希表中未探测到相同的GROUP-BY分组属性值,则创建哈希记录,从全局序列生成器中获得当前序列生成器值作为当前分组属性值的ID,然后序列生成器值增加1,并将分组属性值写入当前分组属性值ID对应的分组向量相应的单元中,同时将该分组属性值ID写入向量索引中对应位置的单元中,或者将记录位置ID和分组属性值ID追加到压缩向量索引中;如果哈希表中探测到相同的GROUP-BY分组属性值时,则将该哈希记录的分组属性值ID存储于向量索引对应的单元,或者将记录位置ID和分组属性值ID追加到压缩向量索引中;其中,在获得序列生成器当前值时利用latch对序列值加锁,序列生成器值增加1后释放latch锁。
- 如权利要求4所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述基于多维分组映射的连接方法,包括以下步骤:对维表进行选择、投影、分组操作后,对投影出来的分组属性值进行动态字典表压缩,将分组属性映射为一个字典向量,字典向量下标作为该维表上的分组属性ID,当维表上有多个分组属性时,每一个分组属性组合值映射为一个向量单元,得到存储有维记录主键值和分组ID的维表哈希表;执行事实表外键列与维表哈希表之间的多表连接操作,得到多个维表分组ID组合值,将多个维表分组ID映射为一个多维数组,每个维表分组ID代表多维数组一个维上的下标,多表连接结果对应一个多维数组地址;将多维数组进一步映射为一维向量,多维数组地址转换为一维向量下标,得到向量索引,将连接结果记录在所述向量索引中。
- 如权利要求4所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述基于稀疏多维数组压缩的连接方法,包括以下步骤:对维表进行选择、投影、分组操作后,对投影出来的分组属性值进行动态字典表压缩,将分组属性映射为一个字典向量,字典向量下标作为该维表上的分组属性ID,当维表上有多个分组属性时,每一个分组属性组合值映射为一个向量单元,得到存储有维记录主键值和分组ID的维表哈希表;执行事实表外键列与维表哈希表之间的多表连接操作,得到多个维表分组ID组合值,将多个维表分组ID映射为一个多维数组,每个维表分组ID代表多维数组一个维上的下标,多表连接结果对应一个多维数组地址;将连接阶段生成的多维数组映射为一维稀疏的分组向量,并根据所述一维稀疏的分组向量创建稠密的分组向量,即在映射时使用序列生成器为每一个非空向量单元分配唯一序列;基于稠密的分组向量创建向量索引,用于事实表度量数据列上的聚集计算操作。
- 如权利要求1所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述步骤3)中,采用基于向量索引的多核并行聚集计算方法,具体实现方法为:若向量索引为定长向量,则按线程数量逻辑划分向量索引分区,并按相同的位置逻辑划分事实数据分区,每个线程并行扫描向量索引分区并执行在对应事实数据分区上的向量聚集操作;若向量索引为压缩格式,则采用在压缩向量索引中均衡逻辑分区的方法,按压缩向量索引分区扫描事实表数据分区进行聚集计算,简化逻辑分区的位置计算并使各线程对应逻辑分区上的聚集计算代价均衡。
- 如权利要求8所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:在并行向量分组聚集操作执行时,采用两种聚集计算方法:一是每个线程使用私有分组向量完成本线程逻辑事实数据分区上的分组聚集计算操作,各线程处理完毕后执行线程间分组向量的归并计算,生成最终的全局分组向量;二是多线程共享统一的分组向量,执行基于并发控制机制的全局分组聚集计算,且当分组向量的大小超过超过最优阈值时,采用全局分组聚集计算方法,当分组向量的大小不超过最优阈值时,采用私有分组向量聚集计算方法,其中,最优阈值为CPU线程每核心最后一级cache单元大小的75%。
- 如权利要求4所述的一种基于分组向量的哈希多表连接实现方法,其特征在于:所述步骤4)中,将分组向量聚集器中的聚集计算结果与连接操作子任务创建的分组向量元数据合并的方法为:当采用哈希预分组连接方法时:将分组向量聚集器单元的地址映射到连接阶段的分组向量中获取地址所映射的GROUP-BY属性值,与分组向量聚集器中存储的聚集计算结果合并,作为SQL查询命令的输出结果;当采用基于多维分组映射的连接方法时,将分组向量聚集器单元的地址映射为多维数组地址,然后分别访问相应维表分组向量并解析出其对应的GROUP-BY属性值,合并而成SQL查询命令的输出结果;当采用稀疏多维数组压缩的连接方法时,将分组向量非空单元根据向量地址转换为多维数组下标,将各下标值映射到各维表字典表向量,访问各GROUP-BY属性值,同时根据分组向量单元存储的分组向量聚集器地址值访问分组向量聚集器单元中的聚集计算结果,生成SQL查询命令的输出结果。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/266,080 US11797509B2 (en) | 2019-06-11 | 2020-01-14 | Hash multi-table join implementation method based on grouping vector |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910500054.7 | 2019-06-11 | ||
CN201910500054.7A CN110263038B (zh) | 2019-06-11 | 2019-06-11 | 一种基于分组向量的哈希多表连接实现方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020248604A1 true WO2020248604A1 (zh) | 2020-12-17 |
Family
ID=67917542
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/071885 WO2020248604A1 (zh) | 2019-06-11 | 2020-01-14 | 一种基于分组向量的哈希多表连接实现方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11797509B2 (zh) |
CN (1) | CN110263038B (zh) |
WO (1) | WO2020248604A1 (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263038B (zh) | 2019-06-11 | 2021-06-15 | 中国人民大学 | 一种基于分组向量的哈希多表连接实现方法 |
US11704318B1 (en) * | 2020-06-12 | 2023-07-18 | A9.Com, Inc. | Micro-partitioning based search |
CN111506271B (zh) * | 2020-06-29 | 2020-10-09 | 南京鹏云网络科技有限公司 | 数据段单元传输差异化管理方法和分布式块存储系统 |
CN113297248B (zh) * | 2020-07-27 | 2022-04-26 | 阿里巴巴集团控股有限公司 | 数据处理、资源分配方法、装置、设备及可读存储介质 |
CN112260951A (zh) * | 2020-09-14 | 2021-01-22 | 北京天融信网络安全技术有限公司 | 期待连接处理方法、装置、可读存储介质和电子设备 |
CN112269791B (zh) * | 2020-11-30 | 2024-04-05 | 上海特高信息技术有限公司 | 一种区块链账本处理方法 |
CN113297209B (zh) * | 2021-02-10 | 2024-03-08 | 阿里巴巴集团控股有限公司 | 数据库执行哈希连接的方法以及装置 |
CN113032427B (zh) * | 2021-04-12 | 2023-12-08 | 中国人民大学 | 一种用于cpu和gpu平台的向量化查询处理方法 |
CN117555903B (zh) * | 2024-01-05 | 2024-04-09 | 珠海星云智联科技有限公司 | 一种数据处理方法、计算机设备及介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100131540A1 (en) * | 2008-11-25 | 2010-05-27 | Yu Xu | System, method, and computer-readable medium for optimizing processing of distinct and aggregation queries on skewed data in a database system |
CN103294831A (zh) * | 2013-06-27 | 2013-09-11 | 中国人民大学 | 列存储数据库中基于多维数组的分组聚集计算方法 |
US20140351239A1 (en) * | 2013-05-23 | 2014-11-27 | Microsoft Corporation | Hardware acceleration for query operators |
CN105631003A (zh) * | 2015-12-28 | 2016-06-01 | 北京赛思信安技术股份有限公司 | 支持海量数据分组统计的智能索引构建、查询及维护方法 |
CN110263038A (zh) * | 2019-06-11 | 2019-09-20 | 中国人民大学 | 一种基于分组向量的哈希多表连接实现方法 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100656528B1 (ko) * | 2001-09-10 | 2006-12-12 | 한국과학기술원 | 영역-합 질의를 위한 동적 업데이트 큐브와 하이브리드질의 검색방법 |
US7831617B2 (en) * | 2006-07-25 | 2010-11-09 | Microsoft Corporation | Re-categorization of aggregate data as detail data and automated re-categorization based on data usage context |
US8832073B2 (en) * | 2007-06-29 | 2014-09-09 | Alcatel Lucent | Method and apparatus for efficient aggregate computation over data streams |
CN101944116B (zh) * | 2010-09-20 | 2013-01-16 | 常州伊冉科技有限公司 | 一种数据仓库中复杂多维层次的连接和聚集方法 |
CN102306176B (zh) * | 2011-08-25 | 2013-09-25 | 浙江鸿程计算机系统有限公司 | 一种基于数据仓库内在特征的olap关键词查询方法 |
CN102663116B (zh) * | 2012-04-11 | 2014-04-23 | 中国人民大学 | 面向列存储数据仓库的多维olap查询处理方法 |
CN102663114B (zh) * | 2012-04-17 | 2013-09-11 | 中国人民大学 | 面向并发olap的数据库查询处理方法 |
CN102663117B (zh) * | 2012-04-18 | 2013-11-20 | 中国人民大学 | 面向数据库与Hadoop混合平台的OLAP查询处理方法 |
CN102982103A (zh) * | 2012-11-06 | 2013-03-20 | 东南大学 | 一种olap海量多维数据维存储方法 |
US9442949B2 (en) * | 2013-03-14 | 2016-09-13 | Futurewei Technologies, Inc. | System and method for compressing data in a database |
US11468061B2 (en) * | 2013-03-15 | 2022-10-11 | Teradata Us, Inc. | Incremental simplification and optimization of complex queries using dynamic result feedback |
CN103366015B (zh) * | 2013-07-31 | 2016-04-27 | 东南大学 | 一种基于Hadoop的OLAP数据存储与查询方法 |
CN103631911B (zh) * | 2013-11-27 | 2017-11-03 | 中国人民大学 | 基于数组存储和向量处理的olap查询处理方法 |
CN104866608B (zh) * | 2015-06-05 | 2018-01-09 | 中国人民大学 | 一种数据仓库中基于连接索引的查询优化方法 |
US9875276B2 (en) * | 2015-06-15 | 2018-01-23 | Sap Se | Database view generation |
-
2019
- 2019-06-11 CN CN201910500054.7A patent/CN110263038B/zh active Active
-
2020
- 2020-01-14 US US17/266,080 patent/US11797509B2/en active Active
- 2020-01-14 WO PCT/CN2020/071885 patent/WO2020248604A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100131540A1 (en) * | 2008-11-25 | 2010-05-27 | Yu Xu | System, method, and computer-readable medium for optimizing processing of distinct and aggregation queries on skewed data in a database system |
US20140351239A1 (en) * | 2013-05-23 | 2014-11-27 | Microsoft Corporation | Hardware acceleration for query operators |
CN103294831A (zh) * | 2013-06-27 | 2013-09-11 | 中国人民大学 | 列存储数据库中基于多维数组的分组聚集计算方法 |
CN105631003A (zh) * | 2015-12-28 | 2016-06-01 | 北京赛思信安技术股份有限公司 | 支持海量数据分组统计的智能索引构建、查询及维护方法 |
CN110263038A (zh) * | 2019-06-11 | 2019-09-20 | 中国人民大学 | 一种基于分组向量的哈希多表连接实现方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110263038A (zh) | 2019-09-20 |
US20210256006A1 (en) | 2021-08-19 |
CN110263038B (zh) | 2021-06-15 |
US11797509B2 (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020248604A1 (zh) | 一种基于分组向量的哈希多表连接实现方法 | |
US9177025B2 (en) | Hash-join in parallel computation environments | |
Xie et al. | Simba: Efficient in-memory spatial analytics | |
EP3365804B1 (en) | Columnar data arrangement for semi-structured data | |
US8660985B2 (en) | Multi-dimensional OLAP query processing method oriented to column store data warehouse | |
CN103942342B (zh) | 一种内存数据库oltp&olap并发查询优化方法 | |
US8762407B2 (en) | Concurrent OLAP-oriented database query processing method | |
JP4073033B1 (ja) | 結合演算の処理機能の向上を考慮した合成関係演算を利用したマルチオペレーション・プロセッシングを用いたデータベースのクエリー処理システム | |
WO2018157765A1 (zh) | 数据库系统中对象的编码及运算方法与数据库服务器 | |
US20110302151A1 (en) | Query Execution Systems and Methods | |
WO2013152543A1 (zh) | 面向列存储数据仓库的多维olap查询处理方法 | |
EP2469423B1 (en) | Aggregation in parallel computation environments with shared memory | |
WO2018157680A1 (zh) | 一种执行计划的生成方法、装置及数据库服务器 | |
JP4071816B1 (ja) | 合成関係演算を利用したマルチオペレーション・プロセッシングを用いたデータベースのクエリー処理システム | |
US20230103328A1 (en) | Data compression techniques | |
US9141654B2 (en) | Executing user-defined function on a plurality of database tuples | |
Hasan et al. | Data transformation from sql to nosql mongodb based on r programming language | |
Arnold et al. | HRDBMS: Combining the best of modern and traditional relational databases | |
Bergami et al. | A Join Operator for Property Graphs. | |
Ordonez et al. | A survey on parallel database systems from a storage perspective: rows versus columns | |
EP2469424B1 (en) | Hash-join in parallel computation environments | |
Zhang et al. | MOSS-DB: a hardware-aware OLAP database | |
CN113742346A (zh) | 资产大数据平台架构优化方法 | |
Schmidt et al. | Basic components for building column store-based applications | |
Zhang et al. | Keyword oriented bitmap join index for in-memory analytical processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20822367 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20822367 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20822367 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 02/02/2024) |