WO2015172533A1 - Database query method and server - Google Patents

Database query method and server Download PDF

Info

Publication number
WO2015172533A1
WO2015172533A1 PCT/CN2014/090240 CN2014090240W WO2015172533A1 WO 2015172533 A1 WO2015172533 A1 WO 2015172533A1 CN 2014090240 W CN2014090240 W CN 2014090240W WO 2015172533 A1 WO2015172533 A1 WO 2015172533A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
cache
data
module
rows
Prior art date
Application number
PCT/CN2014/090240
Other languages
French (fr)
Chinese (zh)
Inventor
李伟鑫
吴继敏
蔡利元
刘成华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015172533A1 publication Critical patent/WO2015172533A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof

Definitions

  • the embodiments of the present invention relate to database technologies, and in particular, to a database query method and a server.
  • the cache Cache is a temporary storage located between the CPU and the memory RAM. Its capacity is smaller than the memory but the exchange speed is fast. When the CPU calls a large amount of data, it can be called directly from the Cache, thus speeding up the reading.
  • the Cache is divided into several levels: L1, L2, and L3.
  • L1 is a single-core exclusive use space for a single CPU
  • L2 is a space for multi-core shared by a single CPU
  • L3 is a space shared by multiple CPUs and multiple cores, for example,
  • the sizes of the Caches of L1, L2, and L3 are 32K, 512K, and 15360K, respectively.
  • the delays of the three CACHEs are: 1.2 ns, 4 ns, and 30 ns, respectively.
  • data reading takes 100 ns.
  • the data is in different positions, and the time required is several or ten times different.
  • the chips of different manufacturers are not the same, the multiples of the delay differences are basically the same.
  • the columnar in-memory database is a database in which the index is organized by columns.
  • the data of each column is stored closely and closely. This storage organization is particularly suitable for analyzing a small number of columns of large data in the analysis scenario.
  • the system When querying, the system only reads the records that need to be processed. Columns, not all data columns.
  • the current column processing is a column decomposition condition.
  • the intermediate data may be caused by the Cache's limited capacity, which may cause the Cache to save the data.
  • RAM In RAM. If the intermediate data is further calculated from the subsequent operations, the intermediate data needs to be taken out from the RAM and calculated. The process of reading data from the RAM will generate a large delay. If the calculation becomes complicated, the delay will increase exponentially.
  • the embodiment of the invention provides a database query method and a server, which reduces the delay of the database query process.
  • the present invention provides a database query method, including: receiving a query request; parsing a query request, obtaining a query condition; determining a target column related to the query condition from the columnar in-memory database; performing the following steps in a loop until Completing the query for all rows: starting from the m*(i-1)+1 row of each target column, storing data of consecutive m rows or data of less than m rows in each target column in the columnar in-memory database to Cache Cache, according to the query condition, query the data stored by the Cache, send the segmentation query result, and release the storage space of the Cache, where m is a natural constant, and i is a variable, indicating the number of times the loop has been executed.
  • the method further includes: when querying the last m rows or less than m rows of data, the segmented query result further includes a query End the identification to inform the requester that the query request has been completed.
  • the method further includes: after receiving the segmentation query result, the requester stores each segment query result, when receiving the query After the identification is completed, the results of each segment query are combined, and the combined query result is used as the final query result.
  • the in-column in-memory database when the data of a row from the target column in the columnar in-memory database is stored in the cache, the in-column in-memory database Get the last row identifier of the target column or the ground of the next row of the target column When the address or pointer is used, the query end identifier is generated.
  • the present invention provides a database query server, including: a receiving module, a parsing module, a determining module, and a query module, wherein: a receiving module is configured to receive a query request sent by a requester, where the query request includes a query condition, and Sending the query request to the parsing module; the parsing module is configured to receive the query request sent by the receiving module, parse the query request, obtain the query condition, send the query condition to the determining module and the query module, and determine the module for receiving Parsing the query condition sent by the module, and determining the target column related to the query condition from the columnar in-memory database; and the query module, which is used to loop the following steps until the query of all rows is completed: from the mth of each target column *(i-1) +1 line starts, storing data of consecutive m rows or data of less than m rows in each target column in the columnar in-memory database to the cache cache, and querying the data stored by the cache
  • the query module is specifically configured to send a segment query including an end of the query when querying the last m rows or less than m rows of data As a result, the requester is notified that the request has been completed this time.
  • the database query server further includes: a generating module, configured to: when storing data of a row from the target column in the columnar in-memory database into the cache The query end identifier is generated when the last row identifier of the target column is obtained in the columnar in-memory database or the address or pointer of the next row of the target column cannot be obtained.
  • a generating module configured to: when storing data of a row from the target column in the columnar in-memory database into the cache The query end identifier is generated when the last row identifier of the target column is obtained in the columnar in-memory database or the address or pointer of the next row of the target column cannot be obtained.
  • the present invention provides a method for querying a columnar in-memory database, the method comprising:
  • Receiving a query request parsing the query request, obtaining the query condition; determining the target column related to the query condition from the column in-memory database; performing the following steps in a loop until the query of all rows is completed: from the m* of each target column -1) Starting at +1 line, storing data of consecutive m rows or data of less than m rows in each target column in the columnar in-memory database to the cache cache, and querying the data stored by the cache according to the query condition, in temporary storage
  • the module stores the segmentation query result, and releases the storage space of the Cache, where m is a natural constant, i is a variable, indicating the number of times the loop has been executed; and when the data of the target column related to the query condition is all queried, the temporary storage module is combined.
  • the result of each segment query is used, and the combined query result is used as the final query result, and the final query result is sent.
  • the method further includes: when storing data of a row from a target column in the columnar in-memory database, to the cache, and obtaining the data in the columnar in-memory database
  • a query end identifier is generated to notify the temporary storage module to combine the result of each segment query in the temporary storage module.
  • the present invention provides a database query server, including: a receiving module, a parsing module, a determining module, a query module, and a temporary storage module, wherein: the receiving module is configured to receive a query request sent by the requester, where the query request includes a query condition And sending the query request to the parsing module; the parsing module is configured to receive the query request sent by the receiving module, parse the query request, obtain the query condition, and send the query condition to the determining module and the query module; Receiving the query condition sent by the parsing module, and determining a target column related to the query condition from the column in-memory database; and a query module for performing the following steps cyclically until all rows are completed: from each target column Starting from the m*(i-1)+1 line, the data of consecutive m rows or the data of the last less than m rows in each target column in the columnar in-memory database is stored in the cache cache, and the cache is stored according to the query condition.
  • the database query server further includes: a generating module, configured to: store data of a row from the target column in the columnar in-memory database into the cache When the last row identifier of the target column is obtained in the columnar in-memory database, or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated to notify the temporary storage module to combine the segments in the temporary storage module. search result.
  • a generating module configured to: store data of a row from the target column in the columnar in-memory database into the cache
  • a query end identifier is generated to notify the temporary storage module to combine the segments in the temporary storage module. search result.
  • the data generated during the query process and the data of the query can only be stored in the Cache, and does not overflow into the RAM, thereby reducing the query.
  • Data overflow in the process reduces the latency of data queries.
  • FIG. 1 is a schematic flowchart of a database query method according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of another method for querying a database according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a query process of a database query method according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a database query server according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of still another database query server according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of still another database query server according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of still another database query server according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of an embodiment of a database query method according to the present invention.
  • the method is applied to a columnar in-memory database, and the columnar data of the columnar in-memory database is stored in the memory, and the method includes:
  • Step 101 Receive a query request.
  • the external device can directly send a query request to the database query server.
  • the requester is an external device, and the external device can also send a query request to the database query server through the proxy device.
  • the requester is a proxy device, and the proxy device can be located in the database. In the query server, it can also be located outside the database query server, and the query request contains the query conditions.
  • Step 102 Parse the query request and obtain the query condition.
  • the database query server obtains the query condition according to the query request.
  • the query condition may be a SQL statement, such as: SELECT NAME FROM DETAIL_RECORD WHERE FEE>100AND AGE>18.
  • Step 103 Determine, from the columnar in-memory database, a target column related to the query condition
  • the column involved in the query condition is the target column.
  • "FEE>100AND AGE>18" in the query condition involves two columns, FEE column and AGE column, then these two columns are the target columns related to the query condition.
  • Step 104 Loop through the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of each target column, the data of consecutive m rows in each target column in the columnar in-memory database or The data of the last less than m rows is stored in the Cache.
  • the query condition the data stored by the Cache is queried, the segmentation query result is sent, and the storage space of the Cache is released, where m is a natural constant, and m represents a target column for each query.
  • the number of lines, m is 1000-10000, especially 8192 lines, i is a variable, indicating the number of times the loop has been executed. For example, i is 1 when the first loop is executed, and i is 2 when the second loop is executed.
  • FIG. 2 is a flowchart of an embodiment of a database query method according to the present invention.
  • the method is applied to a columnar in-memory database, and the columnar data of the columnar in-memory database is stored in the memory, and the method includes:
  • Steps 201-203 are the same as steps 101-103.
  • Step 204 Perform the following steps cyclically until the query of all the rows is completed: starting from the m*(i-1)+1 row of the respective target columns, the respective target columns in the columnar in-memory database are consecutively m
  • the data of the row or the data of the last less than m rows is stored in the cache cache, and the data stored in the cache is queried according to the query condition, and the segment query result is stored in the temporary storage module, and the storage space of the cache is released, wherein m is a natural constant, m is the number of rows per query target column, i is a variable, indicating the number of times the loop has been executed;
  • Step 205 After the data of the target column related to the query condition is all queried, combine the result of each segment query in the temporary storage module, and use the combined query result as a final query result, and send the final query result.
  • the SQL statement in step 102 needs to query two columns, FEE column and AGE column, and according to the preset setting, the number of rows of the target column is 8192 rows each time, that is, m is 8192, and the column is first
  • the first consecutive 8192 rows of data in the FEE column of the in-memory database are stored in the Cache, and the operation is performed according to the query condition and the 8192 rows of data stored in the cache.
  • the first intermediate operation result is obtained, and the first intermediate operation result is stored in the Cache.
  • the result of each intermediate operation can be a bitmap.
  • the operation is performed to obtain the segmentation query result of the query, and then the segment query result is sent to the requester or to the temporary storage module, and the result is released.
  • the storage space of the Cache so that the Cache space can be freed for the next segmentation query.
  • the 8192*(i-1)+1 row starts, and the data of the continuous 8192 rows in the target column is read from the columnar in-memory database into the Cache, and the result of the intermediate operation and the query is obtained.
  • the way each loop executes is the same. There are many ways to read the data of a row from the columnar in-memory database. For example, when reading a row of data, you will get the next row of data pointers or addresses of the row, that is, if the i-th segment is executed.
  • the pointer or address of the first row of the i-th segment query is obtained from the last row of the i-1th segment query, and then the obtained i-th segment query is obtained.
  • the pointer or address of the first line is obtained.
  • each line has a line number, which line needs to be read, and is read by the line number.
  • the data of the last unqueried row of the FEE column in the columnar in-memory database is stored in the cache, and the storage is performed according to the query condition and the Cache. Finally, the data of the row is not queried, and the operation is performed to obtain the first intermediate operation result of the query, and the first intermediate operation result of the query is stored in the Cache. Then, the data of the last Query row of the AGE column in the columnar in-memory database is stored in the Cache, and the operation is performed according to the query condition and the data of the last unqueried row stored by the Cache, and the second intermediate operation result of the query is obtained.
  • the operation is performed to obtain the segmentation query result of the query, and then sent to the requester or to the temporary storage module.
  • the result of the segmentation query releases the storage space of the Cache.
  • the last row identifier of the target column is obtained in the columnar in-memory database or the address of the next row of the target column cannot be obtained. Or a pointer, a query end identifier is generated.
  • a query end identifier is generated.
  • there are multiple ways to determine whether a row of the query is the last row For example, if the number of rows in the target column is known in advance, the number of rows that have been queried is counted in the query process to know whether the row of the query is the last. One line, end the query.
  • the last row in each column has a last row identifier, which indicates that the row is the last row of the column, so when the last row identifier is obtained in the columnar in-memory database during the query, it can be known This line is the last line and ends the query. Or, when querying each row, you will get the pointer or address of the next row. If you can't get the pointer or address of the next row in the columnar in-memory database when querying a row, it means that the last row of the query is finished. Inquire.
  • the database query server can directly send each segment query result to the external device or the proxy device. If directly sent to the proxy device, the proxy device can directly send each segment query result directly to the external device, and the proxy device can also directly The segmentation query result is stored. When the query end flag is received, all segmentation query results are combined according to the query end flag, and the combined segmentation query result is sent to the external device.
  • the database query server can also store the results of each segment query in the temporary storage module in the database query server. When the query end flag is obtained, it means that all the segmentation queries are finished, so all segmentation query results are combined, and Send the combined segmentation query result to the requester.
  • the data of the query is queried for a period of time, so the data generated during the query and the data of the query can only be stored in the Cache, and does not overflow into the RAM, thus reducing the data overflow during the query process, reducing the The delay of the data query.
  • the database query server receives an SQL query statement sent by the requester.
  • the SQL query statement is: SELECT NAME FROM DETAIL_RECORD WHERE FEE>100AND AGE>18.
  • the database query server process is shown in Figure 3.
  • the database query server After the database query server receives the SQL statement, parse the SQL statement and confirm the SQL.
  • the statement is related to the FEE column and the AGE column in the DETAIL_RECORD table.
  • the database query server stores the data of the FEE column for a continuous 8192 rows to the Cache, and performs an operation according to the expression 'FEE>100' and the data of the 8192 rows stored by the Cache, and generates a first bitmap of 8192 rows. (bitmap).
  • the row identifier in the first bitmap that meets the FEE>100 condition is 1, and the row identifier that does not meet the FEE>100 condition is 0, and the first bitmap is stored in the Cache.
  • the specific Cache above may be an L2 level Cache.
  • the general data types are BOOLEAN, INT8, INT16, INT32, INT64, DOUBLE, VARCHAR type, the minimum data type is 1bit, the common numeric type is no more than 8 bytes, and the character type is 30 words. Within the festival. Because of an 8-byte data, 8192 lines is also 64KB. Considering the result of the operation, a bitmap is 8K.
  • the data and processing intermediate results are all within the L2 level CACHE, including the program code segment, so that the CPU can be well utilized. Low delay characteristic of L2 class.
  • the first bitmap and the second bitmap perform an AND operation, generate a third bitmap of 8192 rows, and store the third bitmap in the level. Cache.
  • the database query server includes the processing end identifier in the last message that returns to the external device.
  • steps 2-5 are executed cyclically, if the segmentation query is performed for the first time, in steps 2 and 3, the database query server stores the first consecutive 8192 rows of data in the FEE column to the Cache, if it is the i-th The segment query is executed one time. In steps 2 and 3, the database query server stores the data of the (i)th consecutive 8192 rows that are not queried in the FEE column into the Cache. If the segmentation query is executed last time, in steps 2 and 3, the database query server stores the data of all the remaining rows of the FEE column that have not been queried into the Cache.
  • a database query of 1 billion record data tables is taken as an example, and the amount of data to be evaluated is as follows:
  • the Cache can store the fixed length bitmap instead of the line number set or the excessively long bitmap that satisfies the filtering condition in the traditional scheme. All intermediate results of the query process are calculated in the L2 level Cache. Processing, and will not overflow into the RAM, thus greatly shortening the processing delay of the system and improving the efficiency of the query.
  • FIG. 4 is a schematic structural diagram of a database query server 400 according to an embodiment of the present invention, including: a processor 402, a memory 406, an input/output interface 408, a network interface 408, and a bus 412.
  • Bus 412 can include a path for communicating information between various components of the database query server.
  • the processor 402 is configured to process information, execute instructions or operations, and may be a general purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more for controlling An integrated circuit executed by the program of the present invention.
  • the database query server also includes one or more memories 406 for storing information and instructions, such as: database query server data, the memory may be read-only memory (ROM) or other types that can store static information and instructions.
  • ROM read-only memory
  • the processor 402 further includes a cache cache 404.
  • the cache 404 is configured to obtain a piece of data of the columnar in-memory database from the memory 406.
  • the processor 402 can directly obtain the segment of the cache 404 and perform a segmentation query process.
  • the process data is also stored in the cache 404. After each segment of the query process ends, the stored content in the cache 404 is cleared, and the storage space of the cache 404 is released.
  • Input and output interface 408 can include an input device or an output device.
  • the input device is configured to receive data and information input by the user, such as a keyboard, a mouse, a camera, a scanner, a light pen, a voice input device, a touch screen, and the like.
  • the output device is used to allow output or display of information to the user, including display screens, printers, speakers, and the like.
  • the database query server also includes a network interface 410 that uses devices such as any transceiver to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), and the like.
  • the processor 402 can also be coupled to the input and output interface 408, the network interface 410 via the bus 412.
  • an embodiment of the present invention further discloses a database query server, which includes a receiving module 51, a parsing module 52, a determining module 53, and a query module 54, wherein:
  • the receiving module 51 is configured to receive a query request sent by the requester, where the query request includes a query condition, and send the query request to the parsing module 52;
  • the parsing module 52 is configured to receive the query request sent by the receiving module 51, parse the query request, obtain the query condition, and send the query condition to the determining module 53 and the query module 54;
  • a determining module 53 configured to receive a query condition sent by the parsing module, and determine, from a column in-memory database, a target column related to the query condition;
  • the query module 54 is configured to cyclically execute the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of each target column, the respective target columns in the columnar in-memory database The data of the consecutive m rows or the data of the last less than m rows is stored in the cache cache, and according to the query condition, the data stored by the cache is queried, the query result is sent, and the storage space of the cache is released, where m is a natural The constant, i is a variable, indicating the number of times the loop has been executed.
  • the database query server further includes a generating module 55, configured to be used in the columnar in-memory database when storing data of a row of the target column in the columnar in-memory database into the cache.
  • a query end identifier is generated.
  • the query module 54 is specifically configured to: when querying the last m rows or less than m rows of data, sending the inclusion The query ends the query result of the identification to notify the requester that the current query request has been completed.
  • another embodiment of the present invention further discloses a database query server, including a receiving module 61, a parsing module 62, a determining module 63, a query module 64, and a temporary storage module 65, wherein:
  • the receiving module 61 is configured to receive a query request sent by the requester, where the query request includes a query condition, and send the query request to the parsing module 62;
  • the parsing module 62 is configured to receive the query request sent by the receiving module, parse the query request, obtain the query condition, and send the query condition to the determining module 63 and the query module 64;
  • a determining module 63 configured to receive a query condition sent by the parsing module 62, and determine, from the column in-memory database, a target column related to the query condition;
  • the query module 64 is configured to perform the following steps in a loop until all the rows of the query are completed: starting from the m*(i-1)+1 row of each target column, the m rows in each target column in the columnar in-memory database The data or the last less than m rows of data is stored in the cache cache, according to the query conditions, query the data stored by the cache, store the query result in the temporary storage module, and release the storage space of the cache, where m is a natural constant, i is a variable , indicating the number of times the loop has been executed;
  • the temporary storage module 65 is configured to store the query result during the execution of the loop. When the data of the target column related to the query condition is all queried, the respective query results in the temporary storage module are combined, and the combined query result is used as the final query result. , send the final query result.
  • the database query server further includes: a generating module 66, configured to: when the data of a row of the target column in the columnar in-memory database is stored into the cache, obtain the target column in the columnar in-memory database When the last line identifier or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated to notify the temporary storage module to combine the respective query results in the temporary storage module.
  • a generating module 66 configured to: when the data of a row of the target column in the columnar in-memory database is stored into the cache, obtain the target column in the columnar in-memory database
  • a query end identifier is generated to notify the temporary storage module to combine the respective query results in the temporary storage module.
  • the cache can store the fixed length bitmap instead of the line number set or the excessively long bitmap that satisfies the filtering condition in the traditional scheme, and all intermediate results of the query process are in the Cache calculation processing without overflowing In RAM, the processing delay of the system is greatly shortened, and the efficiency of the query is improved.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Embodiments of the present invention provide a method for querying a columnar memory database. The method comprises: receiving a query request; parsing the query request to acquire a query condition; determining a target column related to the query condition from a columnar memory database; and cyclically performing the following steps till the query of all rows is completed: starting from an [m*(i-1)+1]th row of each target column, storing, into a Cache, data queried in consecutive m rows or data queried in less than m rows at last in all the target columns in the columnar memory database, querying the data stored in the Cache according to the query condition, sending sectionalized query results, and releasing the storage space of the Cache, wherein m is a natural constant, and i is a variable representing the cycled number of times. The method of the present invention reduces data overflow in the query process, and shortens delay of data query.

Description

数据库查询方法和服务器Database query method and server 技术领域Technical field
本发明实施例涉及数据库技术,尤其涉及一种数据库查询方法和服务器。The embodiments of the present invention relate to database technologies, and in particular, to a database query method and a server.
背景技术Background technique
缓存Cache是位于CPU与内存RAM之间的临时存储器,它的容量比内存小但交换速度快。当CPU调用大量数据时,可直接从Cache中调用,从而加快读取速度。通常经常下,Cache分为L1、L2、L3几个级别,L1是单CPU的单核独占使用空间,L2是单CPU的多核共享使用的空间,L3是多个CPU多核共享使用的空间,例如:L1、L2、L3的Cache的大小分别为32K、512K和15360K。以每次读取256Byte为例,3个CACHE的时延分别为:1.2ns、4ns和30ns。而在内存RAM中数据读取需要100ns。相同指令情况下,数据在不同的位置,需要的时间相差数倍或数十倍。虽然各个厂家的芯片不尽相同,但时延相差的倍数基本一致。The cache Cache is a temporary storage located between the CPU and the memory RAM. Its capacity is smaller than the memory but the exchange speed is fast. When the CPU calls a large amount of data, it can be called directly from the Cache, thus speeding up the reading. Usually, the Cache is divided into several levels: L1, L2, and L3. L1 is a single-core exclusive use space for a single CPU, L2 is a space for multi-core shared by a single CPU, and L3 is a space shared by multiple CPUs and multiple cores, for example, The sizes of the Caches of L1, L2, and L3 are 32K, 512K, and 15360K, respectively. Taking 256 Bytes per read as an example, the delays of the three CACHEs are: 1.2 ns, 4 ns, and 30 ns, respectively. In the memory RAM, data reading takes 100 ns. In the case of the same command, the data is in different positions, and the time required is several or ten times different. Although the chips of different manufacturers are not the same, the multiples of the delay differences are basically the same.
列式内存数据库是指数按列组织存储的数据库,每个列的数据独立紧密存放,这种存放组织方式特别适合分析场景下对大量数据的少数列查询,查询时,系统仅读取需要处理的列,而不读入全部数据列。The columnar in-memory database is a database in which the index is organized by columns. The data of each column is stored closely and closely. This storage organization is particularly suitable for analyzing a small number of columns of large data in the analysis scenario. When querying, the system only reads the records that need to be processed. Columns, not all data columns.
当前的列式处理是按列分解条件,查询时,每个列单独查询,而后把符合条件的记录行合并求交集,取得最终结果的。例如,SELECT COUNT(*),SUM(v.SALARY)FROM(SELECT AGE,SALARY,CITY,JOB FROM T WHERE AGE>24AND SALARY>5000AND CITY=’SHENZHEN’AND JOB=’SALE’);列式内存数据库执行这个SQL时通常分解为以下几步:The current column processing is a column decomposition condition. When querying, each column is queried separately, and then the qualified record rows are merged to find the intersection, and the final result is obtained. For example, SELECT COUNT(*), SUM(v.SALARY)FROM(SELECT AGE,SALARY,CITY,JOB FROM T WHERE AGE>24AND SALARY>5000AND CITY='SHENZHEN'AND JOB='SALE'); Columnar In-Memory Database When executing this SQL, it is usually broken down into the following steps:
1,扫描AGE列符合AGE>24条件的记录行号集合和数据集;1. Scan the AGE column to meet the AGE>24 conditional record line number set and data set;
2,扫描SALARY列符合SALARY>5000条件的记录行号集合和数据集;2. Scan the SALARY column to meet the SALARY>5000 conditional record line number set and data set;
3,扫描CITY列符合CITY=’SHENZHEN’条件的记录行号集合和数据集;3. Scan the set of record line numbers and data sets of the CITY column that meet the CITY=’SHENZHEN’ condition;
4,扫描JOB列符合JOB=’SALE’条件的记录行号集合和数据集; 4. Scan the JOB column to meet the JOB='SALE' condition of the record line number set and data set;
5,对AGE、SALARY、CITY、JOB的行号集合求交集,得到一个条件都满足的最终行号集合;5. Intersect the set of line numbers of AGE, SALARY, CITY, and JOB to obtain a final set of line numbers satisfying one condition;
6,统计最终的行号集合,扫描中间结果得到COUNT和SUM结果。6. Count the final set of line numbers and scan the intermediate results to get the COUNT and SUM results.
以上每个步骤,尤其是扫描列时,会产生大量的行号集合和数据集合这些中间数据,由于Cache的容量有限,可能会导致Cache无法保存这些数据,那就不得不将这些中间数据溢出到RAM中。如果后续对中间数据进一步运算时,则需要将这些中间数据从RAM中取出并计算,那么从RAM读取数据的过程将产生较大时延,如果计算变复杂,时延就会呈倍数增加。Each of the above steps, especially when scanning the column, generates a large number of line numbers and data sets. The intermediate data may be caused by the Cache's limited capacity, which may cause the Cache to save the data. In RAM. If the intermediate data is further calculated from the subsequent operations, the intermediate data needs to be taken out from the RAM and calculated. The process of reading data from the RAM will generate a large delay. If the calculation becomes complicated, the delay will increase exponentially.
发明内容Summary of the invention
本发明实施例提供一种数据库查询方法和服务器,减少了数据库查询过程的时延。The embodiment of the invention provides a database query method and a server, which reduces the delay of the database query process.
第一方面,本发明提供了一种数据库查询方法,包括:接收查询请求;解析查询请求,获得查询条件;从列式内存数据库中,确定与查询条件相关的目标列;循环执行以下步骤,直到完成所有行的查询:从各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据查询条件,查询Cache存储的数据,发送分段查询结果,释放Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数。In a first aspect, the present invention provides a database query method, including: receiving a query request; parsing a query request, obtaining a query condition; determining a target column related to the query condition from the columnar in-memory database; performing the following steps in a loop until Completing the query for all rows: starting from the m*(i-1)+1 row of each target column, storing data of consecutive m rows or data of less than m rows in each target column in the columnar in-memory database to Cache Cache, according to the query condition, query the data stored by the Cache, send the segmentation query result, and release the storage space of the Cache, where m is a natural constant, and i is a variable, indicating the number of times the loop has been executed.
结合第一方面,在第一方面的第一种可能的实现方式中,该方法还包括:当查询最后的m行或者少于m行的数据时,在发送的分段查询结果中还包含查询结束标识,以通知请求者本次查询请求已经完成。In conjunction with the first aspect, in a first possible implementation manner of the first aspect, the method further includes: when querying the last m rows or less than m rows of data, the segmented query result further includes a query End the identification to inform the requester that the query request has been completed.
结合第一方面的第一种可能,在第一方面的第二种可能的实现方式中,该方法还包括:请求者接收各个分段查询结果后,存储各个分段查询结果,当接收到查询结束标识后,组合各个分段查询结果,并将组合后的查询结果作为最终查询结果。In combination with the first possibility of the first aspect, in a second possible implementation manner of the first aspect, the method further includes: after receiving the segmentation query result, the requester stores each segment query result, when receiving the query After the identification is completed, the results of each segment query are combined, and the combined query result is used as the final query result.
结合第一方面的第一种可能,在第一方面的第三种可能的实现方式中,当将从列式内存数据库中的目标列某行的数据存储到cache中时,在列式内存数据库中获取到目标列的末行标识或者无法获取到目标列的下一行的地 址或者指针时,则生成查询结束标识。In conjunction with the first possibility of the first aspect, in a third possible implementation of the first aspect, when the data of a row from the target column in the columnar in-memory database is stored in the cache, the in-column in-memory database Get the last row identifier of the target column or the ground of the next row of the target column When the address or pointer is used, the query end identifier is generated.
第二方面,本发明提供了一种数据库查询服务器,包括,接收模块,解析模块、确定模块和查询模块,其中:接收模块,用于接收请求者发送的查询请求,查询请求包含查询条件,并将查询请求发送到解析模块上;解析模块,用于接收接收模块发送到的查询请求,并解析查询请求,获得查询条件,将查询条件发送到确定模块和查询模块上;确定模块,用于接收解析模块发送到的查询条件,并从列式内存数据库中,确定与查询条件相关的目标列;和查询模块,用于循环执行以下步骤,直到完成所有行的查询:从各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据查询条件,查询Cache存储的数据,发送分段查询结果和释放Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数。In a second aspect, the present invention provides a database query server, including: a receiving module, a parsing module, a determining module, and a query module, wherein: a receiving module is configured to receive a query request sent by a requester, where the query request includes a query condition, and Sending the query request to the parsing module; the parsing module is configured to receive the query request sent by the receiving module, parse the query request, obtain the query condition, send the query condition to the determining module and the query module, and determine the module for receiving Parsing the query condition sent by the module, and determining the target column related to the query condition from the columnar in-memory database; and the query module, which is used to loop the following steps until the query of all rows is completed: from the mth of each target column *(i-1) +1 line starts, storing data of consecutive m rows or data of less than m rows in each target column in the columnar in-memory database to the cache cache, and querying the data stored by the cache according to the query condition. Send the segmentation query result and release the storage space of the Cache, where m is a natural constant, i is a variable, indicating that The number of times the loop is executed.
结合第二方面,在第二方面的第一种可能的实现方式中,该查询模块,具体用于当查询最后的m行或者少于m行的数据时,发送包含查询结束标识的分段查询结果,以通知请求者本次查询请求已经完成。With reference to the second aspect, in a first possible implementation manner of the second aspect, the query module is specifically configured to send a segment query including an end of the query when querying the last m rows or less than m rows of data As a result, the requester is notified that the request has been completed this time.
结合第二方面,在第二方面的第二种可能的实现方式中,数据库查询服务器还包括:生成模块,用于当将从列式内存数据库中的目标列某行的数据存储到cache中时,在列式内存数据库中获取到目标列的末行标识或者无法获取到目标列的下一行的地址或者指针时,生成查询结束标识。With reference to the second aspect, in a second possible implementation manner of the second aspect, the database query server further includes: a generating module, configured to: when storing data of a row from the target column in the columnar in-memory database into the cache The query end identifier is generated when the last row identifier of the target column is obtained in the columnar in-memory database or the address or pointer of the next row of the target column cannot be obtained.
第三方面,本发明提供了列式内存数据库查询的方法,方法包括:In a third aspect, the present invention provides a method for querying a columnar in-memory database, the method comprising:
接收查询请求;解析查询请求,获得查询条件;从列式内存数据库中,确定与查询条件相关的目标列;循环执行以下步骤,直到完成所有行的查询:从各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的各个目标列中连续m行的数据或者最后少于m行的数据存储到缓存Cache,根据查询条件,查询Cache存储的数据,在临时存储模块中存储分段查询结果,释放Cache的存储空间,其中m是一个自然常数,i为变量,表示已经循环执行的次数;当查询条件相关的目标列的数据均被查询后,组合临时存储模块中的各个分段查询结果,将组合后的查询结果作为最终查询结果,发送最终查询结果。 Receiving a query request; parsing the query request, obtaining the query condition; determining the target column related to the query condition from the column in-memory database; performing the following steps in a loop until the query of all rows is completed: from the m* of each target column -1) Starting at +1 line, storing data of consecutive m rows or data of less than m rows in each target column in the columnar in-memory database to the cache cache, and querying the data stored by the cache according to the query condition, in temporary storage The module stores the segmentation query result, and releases the storage space of the Cache, where m is a natural constant, i is a variable, indicating the number of times the loop has been executed; and when the data of the target column related to the query condition is all queried, the temporary storage module is combined. The result of each segment query is used, and the combined query result is used as the final query result, and the final query result is sent.
结合第三方面,在第三方面的第一种可能的实现方式中,方法还包括:当将从列式内存数据库中的目标列某行的数据存储到cache中,在列式内存数据库中获取到目标列的末行标识,或者无法获取到目标列的下一行的地址或者指针时,生成查询结束标识,以通知临时存储模块组合临时存储模块中的各个分段查询结果。In conjunction with the third aspect, in a first possible implementation manner of the third aspect, the method further includes: when storing data of a row from a target column in the columnar in-memory database, to the cache, and obtaining the data in the columnar in-memory database When the last line identifier of the target column is reached, or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated to notify the temporary storage module to combine the result of each segment query in the temporary storage module.
第四方面,本发明提供了数据库查询服务器,包括,接收模块,解析模块、确定模块、查询模块和临时存储模块,其中:接收模块,用于接收请求者发送的查询请求,查询请求包含查询条件,并将查询请求发送到解析模块上;解析模块,用于接收接收模块发送到的查询请求,并解析查询请求,获得查询条件,将查询条件发送到确定模块和查询模块上;确定模块,用于接收解析模块发送到的查询条件,并从列式内存数据库中,确定与查询条件相关的目标列;和查询模块,用于循环执行以下步骤,直到完成所有行的查询:从各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据查询条件,查询Cache存储的数据,在临时存储模块中存储分段查询结果,释放Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数;临时存储模块,用于存储循环执行过程中的分段查询结果,当查询条件相关的目标列的数据均被查询后,组合临时存储模块中的各个分段查询结果,将组合后的查询结果作为最终查询结果,发送最终查询结果。In a fourth aspect, the present invention provides a database query server, including: a receiving module, a parsing module, a determining module, a query module, and a temporary storage module, wherein: the receiving module is configured to receive a query request sent by the requester, where the query request includes a query condition And sending the query request to the parsing module; the parsing module is configured to receive the query request sent by the receiving module, parse the query request, obtain the query condition, and send the query condition to the determining module and the query module; Receiving the query condition sent by the parsing module, and determining a target column related to the query condition from the column in-memory database; and a query module for performing the following steps cyclically until all rows are completed: from each target column Starting from the m*(i-1)+1 line, the data of consecutive m rows or the data of the last less than m rows in each target column in the columnar in-memory database is stored in the cache cache, and the cache is stored according to the query condition. Data, storing the segmentation query result in the temporary storage module, releasing the storage space of the Cache, where m is a Constant, i is a variable, indicating the number of times the loop has been executed; the temporary storage module is used to store the result of the segmentation query during the execution of the loop. When the data of the target column related to the query condition is queried, the temporary storage module is combined. The result of each segment query is used as the final query result and the final query result is sent.
结合第四方面,在第四方面的第一种可能的实现方式中,数据库查询服务器,还包括:生成模块,用于当将从列式内存数据库中的目标列某行的数据存储到cache中,在列式内存数据库中获取到目标列的末行标识,或者无法获取到目标列的下一行的地址或者指针时,生成查询结束标识,以通知临时存储模块组合临时存储模块中的各个分段查询结果。With reference to the fourth aspect, in a first possible implementation manner of the fourth aspect, the database query server further includes: a generating module, configured to: store data of a row from the target column in the columnar in-memory database into the cache When the last row identifier of the target column is obtained in the columnar in-memory database, or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated to notify the temporary storage module to combine the segments in the temporary storage module. search result.
本发明的实施例中,由于查询的数据都是一段一段来查询的,因此在查询过程中产生的数据以及查询的数据都只可以存储在Cache中,并没有溢出到RAM中,因此降低了查询过程中的数据溢出,减少了数据查询的时延。 In the embodiment of the present invention, since the data of the query is all queried for a period of time, the data generated during the query process and the data of the query can only be stored in the Cache, and does not overflow into the RAM, thereby reducing the query. Data overflow in the process reduces the latency of data queries.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图1为本发明实施例的数据库查询方法的流程示意图;1 is a schematic flowchart of a database query method according to an embodiment of the present invention;
图2为本发明实施例的数据库查询另一种方法的流程示意图;2 is a schematic flowchart of another method for querying a database according to an embodiment of the present invention;
图3为本发明实施例的数据库查询方法的一个查询过程的示意图;3 is a schematic diagram of a query process of a database query method according to an embodiment of the present invention;
图4为本发明实施例的数据库查询服务器的结构示意图;4 is a schematic structural diagram of a database query server according to an embodiment of the present invention;
图5为本发明实施例的又一个数据库查询服务器的结构示意图;FIG. 5 is a schematic structural diagram of still another database query server according to an embodiment of the present invention; FIG.
图6为本发明实施例的又一个数据库查询服务器的结构示意图;6 is a schematic structural diagram of still another database query server according to an embodiment of the present invention;
图7为本发明实施例的又一个数据库查询服务器的结构示意图。FIG. 7 is a schematic structural diagram of still another database query server according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments.
图1为本发明数据库查询方法实施例的流程图,该方法应用在列式内存数据库中,列式内存数据库的列式数据是存储在内存中的,该方法包括:1 is a flowchart of an embodiment of a database query method according to the present invention. The method is applied to a columnar in-memory database, and the columnar data of the columnar in-memory database is stored in the memory, and the method includes:
步骤101:接收查询请求。Step 101: Receive a query request.
而外部设备可以直接向数据库查询服务器发送查询请求,此时请求者为外部设备,外部设备也可以通过代理装置向数据库查询服务器发送查询请求,此时请求者为代理装置,该代理装置可以位于数据库查询服务器中,也可以位于数据库查询服务器之外,该查询请求包含了查询条件。The external device can directly send a query request to the database query server. At this time, the requester is an external device, and the external device can also send a query request to the database query server through the proxy device. At this time, the requester is a proxy device, and the proxy device can be located in the database. In the query server, it can also be located outside the database query server, and the query request contains the query conditions.
步骤102:解析查询请求,获得查询条件。Step 102: Parse the query request and obtain the query condition.
数据库查询服务器根据查询请求,获得查询条件,这个查询条件可能是一个SQL语句,如:SELECT NAME FROM DETAIL_RECORD WHERE FEE>100AND AGE>18。 The database query server obtains the query condition according to the query request. The query condition may be a SQL statement, such as: SELECT NAME FROM DETAIL_RECORD WHERE FEE>100AND AGE>18.
步骤103:从列式内存数据库中,确定与查询条件相关的目标列;Step 103: Determine, from the columnar in-memory database, a target column related to the query condition;
查询条件涉及到的列就是为目标列,例如:查询条件中“FEE>100AND AGE>18”涉及到两个列,FEE列和AGE列,那么这两个列就是查询条件相关的目标列。The column involved in the query condition is the target column. For example, "FEE>100AND AGE>18" in the query condition involves two columns, FEE column and AGE column, then these two columns are the target columns related to the query condition.
步骤104:循环执行以下步骤,直到完成所有行的查询:从各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的各个目标列中连续m行的数据或最后少于m行的数据存储到Cache,根据所述查询条件,查询Cache存储的数据,发送分段查询结果和释放所述Cache的存储空间,其中m为自然常数,m表示每次查询目标列的行数,m为1000-10000,特别可以取8192行,i是变量,表示已经循环执行的次数,比如第一次循环执行时i为1,第二次循环执行时,i为2。Step 104: Loop through the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of each target column, the data of consecutive m rows in each target column in the columnar in-memory database or The data of the last less than m rows is stored in the Cache. According to the query condition, the data stored by the Cache is queried, the segmentation query result is sent, and the storage space of the Cache is released, where m is a natural constant, and m represents a target column for each query. The number of lines, m is 1000-10000, especially 8192 lines, i is a variable, indicating the number of times the loop has been executed. For example, i is 1 when the first loop is executed, and i is 2 when the second loop is executed.
图2为本发明数据库查询方法实施例的流程图,该方法应用在列式内存数据库中,列式内存数据库的列式数据是存储在内存中的,该方法包括:2 is a flowchart of an embodiment of a database query method according to the present invention. The method is applied to a columnar in-memory database, and the columnar data of the columnar in-memory database is stored in the memory, and the method includes:
步骤201-203与步骤101-103相同。Steps 201-203 are the same as steps 101-103.
步骤204:循环执行以下步骤,直到完成所有行的查询:从所述各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的所述各个目标列中连续m行的数据或者最后少于m行的数据存储到缓存Cache,根据所述查询条件,查询所述Cache存储的数据,在临时存储模块中存储分段查询结果,释放所述Cache的存储空间,其中m是一个自然常数,m表示每次查询目标列的行数,i为变量,表示已经循环执行的次数;Step 204: Perform the following steps cyclically until the query of all the rows is completed: starting from the m*(i-1)+1 row of the respective target columns, the respective target columns in the columnar in-memory database are consecutively m The data of the row or the data of the last less than m rows is stored in the cache cache, and the data stored in the cache is queried according to the query condition, and the segment query result is stored in the temporary storage module, and the storage space of the cache is released, wherein m is a natural constant, m is the number of rows per query target column, i is a variable, indicating the number of times the loop has been executed;
步骤205:当所述查询条件相关的目标列的数据均被查询后,组合临时存储模块中的各个分段查询结果,将组合后的查询结果作为最终查询结果,发送所述最终查询结果。Step 205: After the data of the target column related to the query condition is all queried, combine the result of each segment query in the temporary storage module, and use the combined query result as a final query result, and send the final query result.
例如:上述步骤102中的SQL语句,需要查询两列,FEE列和AGE列,而根据预先的设定,每次查询目标列的行数为8192行,即m为8192,,则首先将列式内存数据库中FEE列第一个连续8192行的数据存储到Cache,并根据查询条件和Cache存储的该8192行的数据,进行运算,获 得第一个中间运算结果,并将该第一中间运算结果存储在Cache中。然后将列式内存数据库中AGE列第一个连续8192行的数据存储到Cache,并根据查询条件和Cache存储的该8192行的数据,进行运算,获得第二个中间运算结果,并将该第二中间运算结果存储在Cache中。各个中间运算结果可以是一个位图。For example, the SQL statement in step 102 needs to query two columns, FEE column and AGE column, and according to the preset setting, the number of rows of the target column is 8192 rows each time, that is, m is 8192, and the column is first The first consecutive 8192 rows of data in the FEE column of the in-memory database are stored in the Cache, and the operation is performed according to the query condition and the 8192 rows of data stored in the cache. The first intermediate operation result is obtained, and the first intermediate operation result is stored in the Cache. Then storing the first consecutive 8192 rows of data in the AGE column of the columnar in-memory database to the Cache, and performing operations according to the query condition and the 8192 rows of data stored in the Cache to obtain a second intermediate operation result, and the first The result of the second intermediate operation is stored in the Cache. The result of each intermediate operation can be a bitmap.
接着,再根据查询条件、第一个中间运算结果和第二个中间运算结果,进行运算,获得本次查询的分段查询结果,然后向请求者或者向临时存储模块发送分段查询结果,释放Cache的存储空间,这样可以腾出Cache的空间,以进行下一次分段查询。Then, according to the query condition, the first intermediate operation result and the second intermediate operation result, the operation is performed to obtain the segmentation query result of the query, and then the segment query result is sent to the requester or to the temporary storage module, and the result is released. The storage space of the Cache, so that the Cache space can be freed for the next segmentation query.
执行下i次分段查询时,第8192*(i-1)+1行开始,从列式内存数据库中读入目标列中的连续8192行的数据到Cache中,中间运算和取得查询的结果的方式,每一次循环执行都是相同的。从列式内存数据库读取某行的数据的方式可以有多种,比如:在读取某行数据时,会得到该行的下一行数据指针或者地址,也就是说如果执行第i次分段查询时,就从第i-1次分段查询的最后一行读取过程中,获取该第i次分段查询的第一行的指针或者地址,然后获取到的该第i次分段查询的第一行的指针或者地址。或者每行都有一个行号,需要读取哪一行,就通过行号来读取。When the next sub-segment query is executed, the 8192*(i-1)+1 row starts, and the data of the continuous 8192 rows in the target column is read from the columnar in-memory database into the Cache, and the result of the intermediate operation and the query is obtained. The way each loop executes is the same. There are many ways to read the data of a row from the columnar in-memory database. For example, when reading a row of data, you will get the next row of data pointers or addresses of the row, that is, if the i-th segment is executed. When the query is performed, the pointer or address of the first row of the i-th segment query is obtained from the last row of the i-1th segment query, and then the obtained i-th segment query is obtained. The pointer or address of the first line. Or each line has a line number, which line needs to be read, and is read by the line number.
当各个目标列中未被查询行的行数少于m(8192)行时,则将列式内存数据库中FEE列最后未被查询行的数据存储到Cache,并根据查询条件和Cache存储的该最后未被查询行的数据,进行运算,获得本次查询的第一个中间运算结果,并将本次查询的第一中间运算结果存储在Cache中。然后将列式内存数据库中AGE列最后未被查询行的数据存储到Cache,并根据查询条件和Cache存储的最后未被查询行的数据,进行运算,获得本次查询的第二个中间运算结果,并将本次查询的该第二中间运算结果存储在Cache中。再根据查询条件、本次查询的第一个中间运算结果和本次查询的第二个中间运算结果,进行运算,获得本次查询的分段查询结果,然后向请求者或者向临时存储模块发送分段查询结果,释放Cache的存储空间。 When the number of rows of the unqueried row in each target column is less than m (8192) rows, the data of the last unqueried row of the FEE column in the columnar in-memory database is stored in the cache, and the storage is performed according to the query condition and the Cache. Finally, the data of the row is not queried, and the operation is performed to obtain the first intermediate operation result of the query, and the first intermediate operation result of the query is stored in the Cache. Then, the data of the last Query row of the AGE column in the columnar in-memory database is stored in the Cache, and the operation is performed according to the query condition and the data of the last unqueried row stored by the Cache, and the second intermediate operation result of the query is obtained. And storing the second intermediate operation result of the query in the Cache. According to the query condition, the first intermediate operation result of the query and the second intermediate operation result of the current query, the operation is performed to obtain the segmentation query result of the query, and then sent to the requester or to the temporary storage module. The result of the segmentation query releases the storage space of the Cache.
当将从列式内存数据库中的目标列某行的数据存储到cache中时,在列式内存数据库中获取到所述目标列的末行标识或者无法获取到所述目标列的下一行的地址或者指针时,则生成查询结束标识。在查询过程中,有多种方式判断查询的一行是否为最后一行,例如:如果事先知道目标列的行数,这样通过在查询过程中统计已经查询的行数,从而获知查询的一行是否为最后一行,结束查询。或者,各个列中的最后一行有一个末行标识,该末行标识指示该行是本列的最后一行,因此在查询过程中,在列式内存数据库中获得该末行标识时,就可以获知该行是最后一行,结束查询。或者,在查询每一行的同时,将会获得下一行的指针或者地址,如果查询一行时,在列式内存数据库中不能获得下一行的指针或者地址时,则表示查询的该行为最后一行,结束查询。When the data of a row of the target column in the columnar in-memory database is stored in the cache, the last row identifier of the target column is obtained in the columnar in-memory database or the address of the next row of the target column cannot be obtained. Or a pointer, a query end identifier is generated. During the query process, there are multiple ways to determine whether a row of the query is the last row. For example, if the number of rows in the target column is known in advance, the number of rows that have been queried is counted in the query process to know whether the row of the query is the last. One line, end the query. Or, the last row in each column has a last row identifier, which indicates that the row is the last row of the column, so when the last row identifier is obtained in the columnar in-memory database during the query, it can be known This line is the last line and ends the query. Or, when querying each row, you will get the pointer or address of the next row. If you can't get the pointer or address of the next row in the columnar in-memory database when querying a row, it means that the last row of the query is finished. Inquire.
数据库查询服务器可以将每一个分段查询结果直接发送给外部设备或代理装置,如果直接发送给代理装置的话,代理装置可以直接将每一个分段查询结果直接发送给外部设备,代理装置也可以先存储各个分段查询结果,当接收到查询结束标志时,根据查询结束标志,组合所有的分段查询结果,并将组合后的分段查询结果发送给外部设备。数据库查询服务器还可以将各个分段查询结果存储在数据库查询服务器里的临时存储模块中,当获得查询结束标志,就意味着所有的分段查询结束了,因此组合所有的分段查询结果,并将组合后的分段查询结果发送给请求者。The database query server can directly send each segment query result to the external device or the proxy device. If directly sent to the proxy device, the proxy device can directly send each segment query result directly to the external device, and the proxy device can also directly The segmentation query result is stored. When the query end flag is received, all segmentation query results are combined according to the query end flag, and the combined segmentation query result is sent to the external device. The database query server can also store the results of each segment query in the temporary storage module in the database query server. When the query end flag is obtained, it means that all the segmentation queries are finished, so all segmentation query results are combined, and Send the combined segmentation query result to the requester.
由于列式内存数据库的列式数据的一列存在很多行,一次性查询所有行,会导致Cache的数据存储不下,溢出RAM中,这样会带来查询的时延,本发明的实施例中,由于查询的数据都是一段一段来查询的,因此在查询过程中产生的数据以及查询的数据都只可以存储在Cache中,并没有溢出到RAM中,因此降低了查询过程中的数据溢出,减少了数据查询的时延。Since there are many rows in the column data of the columnar in-memory database, all the rows are queried in one time, which causes the data of the Cache to be stored in the RAM, which overflows the RAM, which brings about the delay of the query. In the embodiment of the present invention, The data of the query is queried for a period of time, so the data generated during the query and the data of the query can only be stored in the Cache, and does not overflow into the RAM, thus reducing the data overflow during the query process, reducing the The delay of the data query.
下面介绍一个例子来说明上述本发明的实施例。An example will be described below to explain the above embodiments of the present invention.
数据库查询服务器接收请求者发送的一个SQL查询语句,例如,该SQL查询语句为:SELECT NAME FROM DETAIL_RECORD WHERE FEE>100AND AGE>18。数据库查询服务器处理过程,如图3所示。The database query server receives an SQL query statement sent by the requester. For example, the SQL query statement is: SELECT NAME FROM DETAIL_RECORD WHERE FEE>100AND AGE>18. The database query server process is shown in Figure 3.
1、数据库查询服务器接收了SQL语句后,解析该SQL语句,确认该SQL 语句与DETAIL_RECORD表中的FEE列和AGE列有关。1. After the database query server receives the SQL statement, parse the SQL statement and confirm the SQL. The statement is related to the FEE column and the AGE column in the DETAIL_RECORD table.
2、数据库查询服务器将FEE列一个连续8192行的数据存储到Cache,并根据表达式‘FEE>100’和Cache存储的该8192行的数据,进行运算,生成一张8192行的第一位图(bitmap)。该第一位图中符合FEE>100条件的行标识为1,不符合FEE>100条件的行标识为0,将第一位图存储在Cache中。具体的上述Cache可以为L2级别的Cache,。2. The database query server stores the data of the FEE column for a continuous 8192 rows to the Cache, and performs an operation according to the expression 'FEE>100' and the data of the 8192 rows stored by the Cache, and generates a first bitmap of 8192 rows. (bitmap). The row identifier in the first bitmap that meets the FEE>100 condition is 1, and the row identifier that does not meet the FEE>100 condition is 0, and the first bitmap is stored in the Cache. The specific Cache above may be an L2 level Cache.
在列式数据的处理中,一般的数据类型分别是BOOLEAN、INT8,INT16,INT32,INT64,DOUBLE,VARCHAR类型,最小数据类型是1bit,常用数字类型不超过8字节,字符类型也就30字节以内。因为一个8字节数据,8192行也就是64KB,考虑到运算结果一个bitmap也就是8K,数据和处理中间结果都在L2级别的CACHE以内,包括程序代码段,这样就可以很好的利用CPU的L2级的低时延特性。In the processing of column data, the general data types are BOOLEAN, INT8, INT16, INT32, INT64, DOUBLE, VARCHAR type, the minimum data type is 1bit, the common numeric type is no more than 8 bytes, and the character type is 30 words. Within the festival. Because of an 8-byte data, 8192 lines is also 64KB. Considering the result of the operation, a bitmap is 8K. The data and processing intermediate results are all within the L2 level CACHE, including the program code segment, so that the CPU can be well utilized. Low delay characteristic of L2 class.
3、然后将DETAIL_RECORD表中的AGE列第一个连续8192行的数据存储到Cache,并根据表达式AGE>18和Cache存储的该8192行的数据,进行运算,生成一张8192行的第二位图,该第二位图中符合AGE>18条件的行标识为1,不符合AGE>18条件的行标识为0,并将该第二位图存储在Cache中。3. Then store the first consecutive 8192 rows of data in the AGE column of the DETAIL_RECORD table to the Cache, and perform operations according to the expression AGE>18 and the 8192 rows of data stored in the Cache to generate a second row of 8192 rows. The bitmap, the row identifier of the second bitmap corresponding to the AGE>18 condition is 1, the row identifier that does not meet the AGE>18 condition is 0, and the second bitmap is stored in the Cache.
4、接着,再根据表达式FEE>100AND AGE>18、第一位图和第二位图,执行AND操作,生成一张8192行的第三位图,并将第三位图存储在级别的Cache中。4. Next, according to the expression FEE>100AND AGE>18, the first bitmap and the second bitmap, perform an AND operation, generate a third bitmap of 8192 rows, and store the third bitmap in the level. Cache.
5,根据上述第三位图和表达式SELECT NAME FROM DETAIL_RECORD,从DETAIL_RECORD表中提取出相应的NAME列数据,因此获得了本次分段查询结果,然后向请求者发送分段查询结果和释放Cache的存储空间,这样可以腾出Cache的空间,以进行下一次分段查询。5, according to the third bitmap and the expression SELECT NAME FROM DETAIL_RECORD, extract the corresponding NAME column data from the DETAIL_RECORD table, thus obtaining the result of the segmentation query, and then sending the segmentation query result and releasing the Cache to the requester. Storage space, so that you can free up the Cache space for the next segmentation query.
循环执行上面的过程2-5,每循环执行一次,就查询一个segment,直至所有行都被处理完。数据库查询服务器在最后一条返回外部设备的消息中包含处理结束标识。 Loop through the above process 2-5, once per loop, query a segment until all rows have been processed. The database query server includes the processing end identifier in the last message that returns to the external device.
在循环执行上面的步骤2-5时,如果是第一次执行分段查询,则步骤2和3中,数据库查询服务器将FEE列第一个连续8192行的数据存储到Cache,如果是第i次执行分段查询,则步骤2和3中,数据库查询服务器将FEE列中未被查询的第(i)个连续8192行的的数据存储到Cache中。如果最后一次执行分段查询,则步骤2和3中,数据库查询服务器将FEE列中未被查询的剩余的所有行的数据存储到Cache中。When the above steps 2-5 are executed cyclically, if the segmentation query is performed for the first time, in steps 2 and 3, the database query server stores the first consecutive 8192 rows of data in the FEE column to the Cache, if it is the i-th The segment query is executed one time. In steps 2 and 3, the database query server stores the data of the (i)th consecutive 8192 rows that are not queried in the FEE column into the Cache. If the segmentation query is executed last time, in steps 2 and 3, the database query server stores the data of all the remaining rows of the FEE column that have not been queried into the Cache.
例如:本实施场景以10亿记录数据表数据库查询为例,评估数据量如下:For example, in this implementation scenario, a database query of 1 billion record data tables is taken as an example, and the amount of data to be evaluated is as follows:
Figure PCTCN2014090240-appb-000001
Figure PCTCN2014090240-appb-000001
时延对比分析结果如下:(假设city为300个)The results of the comparative analysis of delays are as follows: (assuming the city is 300)
Figure PCTCN2014090240-appb-000002
Figure PCTCN2014090240-appb-000002
Figure PCTCN2014090240-appb-000003
Figure PCTCN2014090240-appb-000003
其中因CACHE溢出到RAM时延11.38-6.63=4.75s,占查询41.7%.Among them, due to CACHE overflow to RAM delay 11.38-6.63 = 4.75s, accounting for 41.7% of the query.
由于对表数据采用了分段查询,Cache中可以保存定长的bitmap而不是传统方案中满足过滤条件的行号集或者过长的bitmap,查询过程的所有中间结果都在L2级别的Cache中计算处理,而不会溢出到RAM中,因此大大缩短了系统的处理时延,提高了查询的效率。Since the segment data is used for the table data, the Cache can store the fixed length bitmap instead of the line number set or the excessively long bitmap that satisfies the filtering condition in the traditional scheme. All intermediate results of the query process are calculated in the L2 level Cache. Processing, and will not overflow into the RAM, thus greatly shortening the processing delay of the system and improving the efficiency of the query.
图4为本发明实施例的一个数据库查询服务器400的结构示意图,包括:处理器402,内存406,输入输出接口408,网络接口408以及总线412。总线412可包括一通路,在数据库查询服务器各个部件之间传送信息。处理器402用于处理信息,执行指令或操作,具体可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路application-specific integrated circuit(ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。数据库查询服务器还包括一个或多个内存406,用于存储信息和指令,如:数据库查询服务器的数据,存储器可以是只读存储器read-only memory(ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器random access memory(RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是磁盘存储器。这些存储器通过总线412与处理器402相连接。处理器402还包括了缓存cache404,缓存404用于从内存406获取列式内存数据库的一段数据,处理器402可以直接获取缓存404的该段数据,并执行分段查询处理过程,将查询的中间过程数据也存储在缓存404中,在每段查询处理结束后,将清除缓存404中的存储的内容,释放缓存404的存储空间。 FIG. 4 is a schematic structural diagram of a database query server 400 according to an embodiment of the present invention, including: a processor 402, a memory 406, an input/output interface 408, a network interface 408, and a bus 412. Bus 412 can include a path for communicating information between various components of the database query server. The processor 402 is configured to process information, execute instructions or operations, and may be a general purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more for controlling An integrated circuit executed by the program of the present invention. The database query server also includes one or more memories 406 for storing information and instructions, such as: database query server data, the memory may be read-only memory (ROM) or other types that can store static information and instructions. A static storage device, random access memory (RAM) or other type of dynamic storage device that can store information and instructions, or disk storage. These memories are coupled to processor 402 via bus 412. The processor 402 further includes a cache cache 404. The cache 404 is configured to obtain a piece of data of the columnar in-memory database from the memory 406. The processor 402 can directly obtain the segment of the cache 404 and perform a segmentation query process. The process data is also stored in the cache 404. After each segment of the query process ends, the stored content in the cache 404 is cleared, and the storage space of the cache 404 is released.
输入输出接口408可包括输入装置或输出装置。输入装置用以接收用户输入的数据和信息,例如键盘,鼠标、摄像头,扫描仪,光笔,语音输入装置,触摸屏等。输出装置用以允许输出或显示信息给用户,包括显示屏,打印机,扬声器等。数据库查询服务器还包括一个网络接口410,该网络接口使用任何收发器一类的装置,以便与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(WLAN)等。处理器402也能通过总线412与输入输出接口408,网络接口410相连接。Input and output interface 408 can include an input device or an output device. The input device is configured to receive data and information input by the user, such as a keyboard, a mouse, a camera, a scanner, a light pen, a voice input device, a touch screen, and the like. The output device is used to allow output or display of information to the user, including display screens, printers, speakers, and the like. The database query server also includes a network interface 410 that uses devices such as any transceiver to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), and the like. The processor 402 can also be coupled to the input and output interface 408, the network interface 410 via the bus 412.
上述数据库查询服务器处理器402和缓存404等具体的处理过程,可以参见如图1所示的方法实施例。For the specific processing procedure of the database query server processor 402 and the cache 404, refer to the method embodiment shown in FIG.
如附图5所示,本发明实施例还公开了一种数据库查询服务器,包括,接收模块51,解析模块52、确定模块53和查询模块54,其中:As shown in FIG. 5, an embodiment of the present invention further discloses a database query server, which includes a receiving module 51, a parsing module 52, a determining module 53, and a query module 54, wherein:
接收模块51,用于接收请求者发送的查询请求,所述查询请求包含查询条件,并将所述查询请求发送到解析模块52上;The receiving module 51 is configured to receive a query request sent by the requester, where the query request includes a query condition, and send the query request to the parsing module 52;
解析模块52,用于接收接收模块51发送到的查询请求,并解析查询请求,获得查询条件,将查询条件发送到确定模块53和查询模块54上;The parsing module 52 is configured to receive the query request sent by the receiving module 51, parse the query request, obtain the query condition, and send the query condition to the determining module 53 and the query module 54;
确定模块53,用于接收所述解析模块发送到的查询条件,并从列式内存数据库中,确定与查询条件相关的目标列;a determining module 53, configured to receive a query condition sent by the parsing module, and determine, from a column in-memory database, a target column related to the query condition;
查询模块54,用于循环执行以下步骤,直到完成所有行的查询:从所述各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的所述各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据所述查询条件,查询所述Cache存储的数据,发送查询结果和释放所述Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数。The query module 54 is configured to cyclically execute the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of each target column, the respective target columns in the columnar in-memory database The data of the consecutive m rows or the data of the last less than m rows is stored in the cache cache, and according to the query condition, the data stored by the cache is queried, the query result is sent, and the storage space of the cache is released, where m is a natural The constant, i is a variable, indicating the number of times the loop has been executed.
进一步的,如图6所示,上述数据库查询服务器还包括生成模块55,用于用于当将从列式内存数据库中的目标列某行的数据存储到cache中时,在列式内存数据库中获取到所述目标列的末行标识或者无法获取到所述目标列的下一行的地址或者指针时,生成查询结束标识。进一步的,所述查询模块54,具体用于当查询最后的m行或者少于m行的数据时,发送包含 所述查询结束标识的查询结果,以通知所述请求者本次查询请求已经完成。Further, as shown in FIG. 6, the database query server further includes a generating module 55, configured to be used in the columnar in-memory database when storing data of a row of the target column in the columnar in-memory database into the cache. When the last row identifier of the target column is obtained or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated. Further, the query module 54 is specifically configured to: when querying the last m rows or less than m rows of data, sending the inclusion The query ends the query result of the identification to notify the requester that the current query request has been completed.
如附图7所示,本发明实施例还公开了另一种数据库查询服务器,包括,接收模块61,解析模块62、确定模块63、查询模块64和临时存储模块65,其中:As shown in FIG. 7, another embodiment of the present invention further discloses a database query server, including a receiving module 61, a parsing module 62, a determining module 63, a query module 64, and a temporary storage module 65, wherein:
接收模块61,用于接收请求者发送的查询请求,查询请求包含查询条件,并将查询请求发送到解析模块62上;The receiving module 61 is configured to receive a query request sent by the requester, where the query request includes a query condition, and send the query request to the parsing module 62;
解析模块62,用于接收接收模块发送到的查询请求,并解析查询请求,获得查询条件,将查询条件发送到确定模块63和查询模块64上;The parsing module 62 is configured to receive the query request sent by the receiving module, parse the query request, obtain the query condition, and send the query condition to the determining module 63 and the query module 64;
确定模块63,用于接收解析模块62发送到的查询条件,并从列式内存数据库中,确定与查询条件相关的目标列;和a determining module 63, configured to receive a query condition sent by the parsing module 62, and determine, from the column in-memory database, a target column related to the query condition; and
查询模块64,用于循环执行以下步骤,直到完成所有行的查询:从各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据查询条件,查询Cache存储的数据,在临时存储模块中存储查询结果,释放Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数;The query module 64 is configured to perform the following steps in a loop until all the rows of the query are completed: starting from the m*(i-1)+1 row of each target column, the m rows in each target column in the columnar in-memory database The data or the last less than m rows of data is stored in the cache cache, according to the query conditions, query the data stored by the cache, store the query result in the temporary storage module, and release the storage space of the cache, where m is a natural constant, i is a variable , indicating the number of times the loop has been executed;
临时存储模块65,用于存储循环执行过程中的查询结果,当查询条件相关的目标列的数据均被查询后,组合临时存储模块中的各个查询结果,将组合后的查询结果作为最终查询结果,发送最终查询结果。The temporary storage module 65 is configured to store the query result during the execution of the loop. When the data of the target column related to the query condition is all queried, the respective query results in the temporary storage module are combined, and the combined query result is used as the final query result. , send the final query result.
进一步的,所述数据库查询服务器还包括:生成模块66,用于当将从列式内存数据库中的目标列某行的数据存储到cache中,在列式内存数据库中获取到所述目标列的末行标识,或者无法获取到所述目标列的下一行的地址或者指针时,生成查询结束标识,以通知所述临时存储模块组合临时存储模块中的各个查询结果。Further, the database query server further includes: a generating module 66, configured to: when the data of a row of the target column in the columnar in-memory database is stored into the cache, obtain the target column in the columnar in-memory database When the last line identifier or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated to notify the temporary storage module to combine the respective query results in the temporary storage module.
由于本发明的数据库查询服务器对表数据采用了分段查询,Cache中可以保存定长的bitmap而不是传统方案中满足过滤条件的行号集或者过长的bitmap,查询过程的所有中间结果都在Cache中计算处理,而不会溢出到 RAM中,因此大大缩短了系统的处理时延,提高了查询的效率。Since the database query server of the present invention uses segmentation query for the table data, the cache can store the fixed length bitmap instead of the line number set or the excessively long bitmap that satisfies the filtering condition in the traditional scheme, and all intermediate results of the query process are in the Cache calculation processing without overflowing In RAM, the processing delay of the system is greatly shortened, and the efficiency of the query is improved.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims (12)

  1. 一种列式内存数据库查询的方法,其特征在于,所述方法包括:A method for querying a columnar in-memory database, the method comprising:
    接收查询请求;Receiving a query request;
    解析所述查询请求,获得查询条件;Parsing the query request to obtain a query condition;
    从列式内存数据库中,确定与查询条件相关的目标列;From the columnar in-memory database, determine the target column associated with the query condition;
    循环执行以下步骤,直到完成所有行的查询:从所述各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的所述各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据所述查询条件,查询所述Cache存储的数据,发送分段查询结果,释放所述Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数。Looping through the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of the respective target columns, the data of consecutive m rows in the respective target columns in the columnar in-memory database will be Or the data of the last less than m rows is stored in the cache cache, and the data stored by the cache is queried according to the query condition, and the segment query result is sent to release the storage space of the cache, where m is a natural constant, i is A variable indicating the number of times the loop has been executed.
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:当查询最后的m行或者少于m行的数据时,在发送的分段查询结果中还包含查询结束标识,以通知所述请求者本次查询请求已经完成。The method according to claim 1, wherein the method further comprises: when querying the last m rows or less than m rows of data, the segmentation query result further includes a query end identifier to notify The requester has completed the query request this time.
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:所述请求者接收各个分段查询结果后,存储所述各个分段查询结果,当接收到查询结束标识后,组合所述各个分段查询结果,并将组合后的查询结果作为最终查询结果。The method according to claim 2, wherein the method further comprises: after the requester receives the respective segment query results, storing the segmentation query results, and after receiving the query end identifier, the combination office The result of each segment query is described, and the combined query result is taken as the final query result.
  4. 如权利要求2所述的方法,其特征在于,所述方法还包括:The method of claim 2, wherein the method further comprises:
    当将从列式内存数据库中的目标列某行的数据存储到cache中时,在列式内存数据库中获取到所述目标列的末行标识或者无法获取到所述目标列的下一行的地址或者指针时,则生成查询结束标识。When the data of a row of the target column in the columnar in-memory database is stored in the cache, the last row identifier of the target column is obtained in the columnar in-memory database or the address of the next row of the target column cannot be obtained. Or a pointer, a query end identifier is generated.
  5. 如权利要求1或2所述的方法,其特征在于,所述Cache为L2级别时,所述m为1000-10000。The method according to claim 1 or 2, wherein when the Cache is at the L2 level, the m is 1000-10000.
  6. 一种数据库查询服务器,其特征在于,包括,接收模块,解析模块、确定模块和查询模块,其中: A database query server, comprising: a receiving module, a parsing module, a determining module and a query module, wherein:
    所述接收模块,用于接收请求者发送的查询请求,所述查询请求包含查询条件,并将所述查询请求发送到所述解析模块上;The receiving module is configured to receive a query request sent by a requester, where the query request includes a query condition, and send the query request to the parsing module;
    所述解析模块,用于接收所述接收模块发送到的查询请求,并解析所述查询请求,获得查询条件,将所述查询条件发送到所述确定模块和所述查询模块上;The parsing module is configured to receive a query request sent by the receiving module, parse the query request, obtain a query condition, and send the query condition to the determining module and the query module;
    所述确定模块,用于接收所述解析模块发送到的查询条件,并从列式内存数据库中,确定与查询条件相关的目标列;和The determining module is configured to receive a query condition sent by the parsing module, and determine, from a column in-memory database, a target column related to the query condition; and
    所述查询模块,用于循环执行以下步骤,直到完成所有行的查询:从所述各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的所述各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据所述查询条件,查询所述Cache存储的数据,发送分段查询结果和释放所述Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数。The query module is configured to cyclically execute the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of each target column, the respective targets in the column in-memory database are to be The data of consecutive m rows in the column or the data of the last less than m rows is stored in the cache cache, and the data stored by the cache is queried according to the query condition, the segment query result is sent, and the storage space of the cache is released, where m Is a natural constant, i is a variable, indicating the number of times the loop has been executed.
  7. 如权利要求6所述的数据库查询服务器,其特征在于,所述查询模块,具体用于当查询最后的m行或者少于m行的数据时,发送包含所述查询结束标识的分段查询结果,以通知所述请求者本次查询请求已经完成。The database query server according to claim 6, wherein the query module is configured to send a segment query result including the query end identifier when querying the last m rows or less than m rows of data. To notify the requester that the query request has been completed.
  8. 如权利要求6所述的数据库查询服务器,其特征在于,还包括:生成模块,用于当将从列式内存数据库中的目标列某行的数据存储到cache中时,在列式内存数据库中获取到所述目标列的末行标识或者无法获取到所述目标列的下一行的地址或者指针时,生成查询结束标识。The database query server according to claim 6, further comprising: a generating module, configured to store the data of the row of the target column in the columnar in-memory database into the cache, in the columnar in-memory database When the last row identifier of the target column is obtained or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated.
  9. 一种列式内存数据库查询的方法,其特征在于,所述方法包括:A method for querying a columnar in-memory database, the method comprising:
    接收查询请求;Receiving a query request;
    解析所述查询请求,获得查询条件;Parsing the query request to obtain a query condition;
    从列式内存数据库中,确定与查询条件相关的目标列;From the columnar in-memory database, determine the target column associated with the query condition;
    循环执行以下步骤,直到完成所有行的查询:从所述各个目标列的第 m*(i-1)+1行开始,将列式内存数据库中的所述各个目标列中连续m行的数据或者最后少于m行的数据存储到缓存Cache,根据所述查询条件,查询所述Cache存储的数据,在临时存储模块中存储分段查询结果,释放所述Cache的存储空间,其中m是一个自然常数,i为变量,表示已经循环执行的次数;Loop through the following steps until you complete the query for all rows: from the respective target columns Starting with m*(i-1)+1, storing data of consecutive m rows or data of less than m rows in the respective target columns in the columnar in-memory database to the cache cache, according to the query condition, query The data stored in the cache stores the segmentation query result in the temporary storage module, and releases the storage space of the cache, where m is a natural constant, and i is a variable, indicating the number of times that the loop has been executed;
    当所述查询条件相关的目标列的数据均被查询后,组合临时存储模块中的各个分段查询结果,将组合后的查询结果作为最终查询结果,发送所述最终查询结果。After the data of the target column related to the query condition is all queried, the result of each segment query in the temporary storage module is combined, and the combined query result is used as a final query result, and the final query result is sent.
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:The method of claim 9 wherein the method further comprises:
    当将从列式内存数据库中的目标列某行的数据存储到cache中,在列式内存数据库中获取到所述目标列的末行标识,或者无法获取到所述目标列的下一行的地址或者指针时,生成查询结束标识,以通知所述临时存储模块组合临时存储模块中的各个分段查询结果。When the data of a row of the target column in the columnar in-memory database is stored in the cache, the last row identifier of the target column is obtained in the columnar in-memory database, or the address of the next row of the target column cannot be obtained. Or a pointer, generating a query end identifier to notify the temporary storage module to combine the respective segment query results in the temporary storage module.
  11. 一种数据库查询服务器,其特征在于,包括,接收模块,解析模块、确定模块、查询模块和临时存储模块,其中:A database query server, comprising: a receiving module, a parsing module, a determining module, a query module and a temporary storage module, wherein:
    所述接收模块,用于接收请求者发送的查询请求,所述查询请求包含查询条件,并将所述查询请求发送到所述解析模块上;The receiving module is configured to receive a query request sent by a requester, where the query request includes a query condition, and send the query request to the parsing module;
    所述解析模块,用于接收所述接收模块发送到的查询请求,并解析所述查询请求,获得查询条件,将所述查询条件发送到所述确定模块和所述查询模块上;The parsing module is configured to receive a query request sent by the receiving module, parse the query request, obtain a query condition, and send the query condition to the determining module and the query module;
    所述确定模块,用于接收所述解析模块发送到的查询条件,并从列式内存数据库中,确定与查询条件相关的目标列;和The determining module is configured to receive a query condition sent by the parsing module, and determine, from a column in-memory database, a target column related to the query condition; and
    所述查询模块,用于循环执行以下步骤,直到完成所有行的查询:从所述各个目标列的第m*(i-1)+1行开始,将列式内存数据库中的所述各个目标列中连续m行的数据或最后少于m行的数据存储到缓存Cache,根据所述查询条件,查询所述Cache存储的数据,在临时存储模块中存储分 段查询结果,释放所述Cache的存储空间,其中m是一个自然常数,i是变量,表示已经循环执行的次数;The query module is configured to cyclically execute the following steps until the query of all rows is completed: starting from the m*(i-1)+1 row of each target column, the respective targets in the column in-memory database are to be The data of consecutive m rows in the column or the data of the last less than m rows is stored in the cache cache, and the data stored by the cache is queried according to the query condition, and the data is stored in the temporary storage module. Segment query result, release the storage space of the Cache, where m is a natural constant, i is a variable, indicating the number of times that the loop has been executed;
    所述临时存储模块,用于存储循环执行过程中的分段查询结果,当所述查询条件相关的目标列的数据均被查询后,组合临时存储模块中的各个分段查询结果,将组合后的查询结果作为最终查询结果,发送所述最终查询结果。The temporary storage module is configured to store a segmentation query result in a loop execution process, and after the data of the target column related to the query condition is all queried, combine the segmentation query results in the temporary storage module, and combine the results. The result of the query is sent as the final query result, and the final query result is sent.
  12. 如权利要求11所述的数据库查询服务器,其特征在于,还包括:生成模块,用于当将从列式内存数据库中的目标列某行的数据存储到cache中,在列式内存数据库中获取到所述目标列的末行标识,或者无法获取到所述目标列的下一行的地址或者指针时,生成查询结束标识,以通知所述临时存储模块组合临时存储模块中的各个分段查询结果。 The database query server according to claim 11, further comprising: a generating module, configured to: store data from a row of the target column in the columnar in-memory database into the cache, and obtain the data in the columnar in-memory database When the last row identifier of the target column is obtained, or the address or pointer of the next row of the target column cannot be obtained, a query end identifier is generated to notify the temporary storage module to combine the segment query results in the temporary storage module. .
PCT/CN2014/090240 2014-05-12 2014-11-04 Database query method and server WO2015172533A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410199444.2 2014-05-12
CN201410199444.2A CN103970870A (en) 2014-05-12 2014-05-12 Database query method and server

Publications (1)

Publication Number Publication Date
WO2015172533A1 true WO2015172533A1 (en) 2015-11-19

Family

ID=51240367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/090240 WO2015172533A1 (en) 2014-05-12 2014-11-04 Database query method and server

Country Status (2)

Country Link
CN (1) CN103970870A (en)
WO (1) WO2015172533A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300370A1 (en) * 2017-04-18 2018-10-18 Microsoft Technology Licensing, Llc Delay detection in query processing
CN109086279A (en) * 2017-06-13 2018-12-25 北京京东尚科信息技术有限公司 Caching report method and apparatus
CN109885589A (en) * 2017-12-06 2019-06-14 腾讯科技(深圳)有限公司 Data query method, apparatus, computer equipment and storage medium

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970870A (en) * 2014-05-12 2014-08-06 华为技术有限公司 Database query method and server
US9836507B2 (en) 2014-09-17 2017-12-05 Futurewei Technologies, Inc. Method and system for adaptively building a column store database from a temporal row store database based on query demands
CN104331467B (en) * 2014-10-31 2019-02-26 上海玮舟微电子科技有限公司 A kind of data query method and system
US10725987B2 (en) * 2014-11-25 2020-07-28 Sap Se Forced ordering of a dictionary storing row identifier values
CN104537030B (en) * 2014-12-22 2018-03-02 华为技术有限公司 A kind of business datum querying method and device, Database Systems
CN106682042B (en) * 2015-11-11 2019-11-22 杭州海康威视数字技术股份有限公司 A kind of relation data caching and querying method and device
US10558458B2 (en) * 2016-06-06 2020-02-11 Microsoft Technology Licensing, Llc Query optimizer for CPU utilization and code refactoring
CN106294772B (en) * 2016-08-11 2019-03-19 电子科技大学 The buffer memory management method of distributed memory columnar database
CN106776689A (en) * 2016-11-10 2017-05-31 郑州云海信息技术有限公司 A kind of host information querying method, device and main frame
CN110019332A (en) * 2017-09-29 2019-07-16 北京国双科技有限公司 A kind of data query method and device based on precomputation
CN108009250B (en) * 2017-12-01 2021-09-07 武汉斗鱼网络科技有限公司 Multi-classification event data cache establishing and querying method and device
CN109800237A (en) * 2019-01-28 2019-05-24 广州酷狗计算机科技有限公司 Data query method, apparatus and computer readable storage medium
CN113553345A (en) * 2021-07-19 2021-10-26 北京东方国信科技股份有限公司 Query method and device for vectorization database
CN113626555B (en) * 2021-10-12 2022-03-18 中孚安全技术有限公司 Method and system for displaying oversized form through sectional loading

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102171695A (en) * 2008-10-05 2011-08-31 微软公司 Efficient large-scale joining for querying of column based data encoded structures
CN103631838A (en) * 2012-08-24 2014-03-12 软件股份公司 Method and system for storing tabular data in a memory-efficient manner
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment
CN103970870A (en) * 2014-05-12 2014-08-06 华为技术有限公司 Database query method and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102171695A (en) * 2008-10-05 2011-08-31 微软公司 Efficient large-scale joining for querying of column based data encoded structures
CN103631838A (en) * 2012-08-24 2014-03-12 软件股份公司 Method and system for storing tabular data in a memory-efficient manner
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment
CN103970870A (en) * 2014-05-12 2014-08-06 华为技术有限公司 Database query method and server

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300370A1 (en) * 2017-04-18 2018-10-18 Microsoft Technology Licensing, Llc Delay detection in query processing
US10496647B2 (en) * 2017-04-18 2019-12-03 Microsoft Technology Licensing, Llc Delay detection in query processing
CN109086279A (en) * 2017-06-13 2018-12-25 北京京东尚科信息技术有限公司 Caching report method and apparatus
CN109885589A (en) * 2017-12-06 2019-06-14 腾讯科技(深圳)有限公司 Data query method, apparatus, computer equipment and storage medium
CN109885589B (en) * 2017-12-06 2022-09-16 腾讯科技(深圳)有限公司 Data query method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN103970870A (en) 2014-08-06

Similar Documents

Publication Publication Date Title
WO2015172533A1 (en) Database query method and server
US10198363B2 (en) Reducing data I/O using in-memory data structures
US8521962B2 (en) Managing counter saturation in a filter
KR102147356B1 (en) Cache memory system and operating method for the same
US9524318B2 (en) Minimizing result set size when converting from asymmetric to symmetric requests
JP2015528603A (en) Aggregation / grouping operation: Hardware implementation of hash table method
US9798591B2 (en) Method, apparatus, and chip for implementing mutually-exclusive operation of multiple threads
US11308060B2 (en) Method, apparatus, device and storage medium for managing index
CN112307062B (en) Database aggregation query method, device and system
US10747773B2 (en) Database management system, computer, and database management method
US9646053B2 (en) OLTP compression of wide tables
US20220358178A1 (en) Data query method, electronic device, and storage medium
CN113010535B (en) Cache data updating method, device, equipment and storage medium
US10545867B2 (en) Device and method for enhancing item access bandwidth and atomic operation
CN110928900B (en) Multi-table data query method, device, terminal and computer storage medium
CN108345648B (en) Method and device for acquiring log information based on columnar storage
JP2018509666A (en) Method and apparatus for determining a SQL execution plan
US11868620B2 (en) Read-write method and apparatus, electronic device, and readable memory medium
CN113157629A (en) Data processing method and device, electronic equipment and storage medium
CN110909029A (en) Method and medium for realizing cache based on Nosql
US9223708B2 (en) System, method, and computer program product for utilizing a data pointer table pre-fetcher
CN116303125B (en) Request scheduling method, cache, device, computer equipment and storage medium
WO2024016863A1 (en) Rule lookup method and apparatus, device and computer-readable storage medium
WO2022223047A1 (en) Data read/write method, data read/write controller, and storage medium
CN117271840B (en) Data query method and device of graph database and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14892158

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14892158

Country of ref document: EP

Kind code of ref document: A1