WO2021115304A1 - Data processing method, apparatus and device, and storage medium - Google Patents

Data processing method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2021115304A1
WO2021115304A1 PCT/CN2020/134785 CN2020134785W WO2021115304A1 WO 2021115304 A1 WO2021115304 A1 WO 2021115304A1 CN 2020134785 W CN2020134785 W CN 2020134785W WO 2021115304 A1 WO2021115304 A1 WO 2021115304A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
column
target
rows
expression
Prior art date
Application number
PCT/CN2020/134785
Other languages
French (fr)
Chinese (zh)
Inventor
阮羽彬
吴迪
缪哲语
李猛
梁宇坤
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021115304A1 publication Critical patent/WO2021115304A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Definitions

  • the present invention relates to the field of communication technology, in particular to a data processing method, device, equipment and storage medium.
  • SQL Structured Query Language
  • One or more embodiments of the present invention describe a data processing method, device, device, and storage medium to solve the problem of low data processing efficiency in related technologies, which affects the slow data query speed of SQL statements.
  • the present invention is implemented as follows:
  • a data processing method may include:
  • a data processing device which may include:
  • the obtaining module is used to obtain the target model corresponding to the expression in the data query request, and the expression includes constraint conditions;
  • the processing module is used to input expressions and target data in the columnar database into the target model to obtain execution results corresponding to the constraint conditions; wherein the target data includes all data in at least one column and part of data in at least one column;
  • the output module is used to output the execution result.
  • a computing device includes at least one processor and a memory, the memory is used to store computer program instructions, and the processor is used to execute a program in the memory to control the computing device to implement the method shown in the first aspect. Data processing method.
  • a computer-readable storage medium on which a computer program is stored. If the computer program is executed in a computer, the computer is caused to execute the data processing method shown in the first aspect.
  • the target model corresponding to the expression in the data query request is obtained, where the expression contains the constraint condition; the expression and the target data in the columnar database are input into the target model to obtain the target model corresponding to the constraint condition
  • the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, reduces the number of writes or outputs in expression calculations, greatly reduces the time to obtain data, reduces the amount of calculation, and improves data processing. Efficiency to improve the data query speed of SQL statements.
  • Fig. 1 shows a schematic structural diagram of a data processing method according to an embodiment
  • Fig. 2 shows a flowchart of a data processing method according to an embodiment
  • Fig. 3 shows a flowchart of a method for realizing data processing by a first target model according to an embodiment
  • Fig. 4 shows a flowchart of a method for realizing data processing by a second target model according to an embodiment
  • Figure 5 shows a flow chart of a method for realizing data processing by a first target model and a second target model according to an embodiment
  • Fig. 6 shows a flow chart of a method for realizing data processing by a third target model according to an embodiment
  • Fig. 7 shows a flowchart of a method for realizing data processing by a fourth target model according to an embodiment
  • Fig. 8 shows a flow chart of a method for implementing data processing by a fifth target model according to an embodiment
  • FIG. 9 shows a flowchart of another method for implementing data processing by a fifth target model according to an embodiment
  • Fig. 10 shows a structural block diagram of a data processing device according to an embodiment
  • Fig. 11 shows a schematic structural diagram of a computing device according to an embodiment.
  • the abstract iterator Since the iterator abstracts the entire data table, the abstract iterator will make multi-layer function calls when acquiring data, which reduces the overall processing performance. In addition, each movement of the iterator involves an operation of fetching data. This operation often involves too much writing and output, which brings very large overheads and excessive calculations, thereby reducing data processing efficiency and reducing SQL Data query speed of the statement.
  • embodiments of the present invention provide a data processing method device, device, and storage medium, which are specifically as follows.
  • the architecture can include target nodes and columnar databases.
  • the target node receives the data query request, it determines the target model corresponding to the expression in the data query request, and the expression contains constraint conditions; then, enters the expression and the target data in the columnar database into the target model, and obtains and The execution result corresponding to the constraint condition; wherein the target data includes all the data in at least one column and part of the data in at least one column; the execution result is output.
  • the columnar database in the embodiment of the present invention may include a Histore columnar database.
  • Histore columnar database is a columnar database evolved from an open source database. The data of each data table in the database is stored in columns. Therefore, you can quickly get all rows of a column when calculating expressions. data. This makes it naturally have the characteristics of batch processing of data. Therefore, the new target model based on batch processing in the embodiment of the present invention uses this feature to divide the data in the data table into different batches, and each expression uses batch as the unit to process the data to speed up The processing speed of the entire expression.
  • the above architecture can be applied to application scenarios where users use SQL statements to query data. Or, in a scenario where expression calculations are performed based on the aforementioned target model.
  • the embodiment of the present invention makes full use of the feature of columnar database that can quickly obtain a certain column of data, reduces the number of writes or outputs in expression calculations, greatly reduces the time for obtaining data, reduces the amount of calculations, and improves data. Processing efficiency.
  • the embodiment of the present invention further illustrates the data processing method provided by the embodiment of the present invention with reference to FIGS. 2-9.
  • Fig. 2 shows a flowchart of a data processing method according to an embodiment.
  • the method may include steps 210 to 230: First, step 210, obtain the target model corresponding to the expression in the data query request, the expression contains constraints; second, step 220, combine the expression and column The target data in the formula database is input into the target model, and the execution result corresponding to the constraint condition is obtained; then, in step 230, the execution result is output.
  • the target node receives a data query request
  • the data query request includes an expression
  • the expression can be expressed as "to count the number of clicks on a certain product", “to query the number of visits to a certain platform”, and so on.
  • the expression may include at least one constraint condition. Therefore, the above expression “statistics of clicks on a certain product” may include the following constraints: “statistics of clicks on a certain product from January to June” and “statistics of clicks on a certain product” "Data that the number of clicks for a product is within 1-1000" and so on.
  • the method provided by the embodiment of the present invention may further include: determining the target model corresponding to the expression in the data query request according to the data query request.
  • each expression has its corresponding target model.
  • six target models are provided, which will be described in detail below in conjunction with step 220.
  • the target data in the embodiment of the present invention includes all the data in at least one column and part of the data in at least one column.
  • the data in the columnar database is stored in columns, and each column stores data of at least one attribute separately.
  • this step may specifically include: inputting the expression and target data into the target model;
  • the expression and target data are parallelized to obtain the execution result corresponding to the constraint condition.
  • SIMD Single Instruction Multiple Data
  • the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
  • this step may specifically include inputting the first bitmap data, expression, and target data into the target model;
  • the data that meets the constraint conditions are processed, and the execution result corresponding to the constraint conditions is obtained.
  • the number of expression operations can be reduced.
  • the expression and target data are parallelized to obtain the execution result corresponding to the constraint condition.
  • SIMD Single Instruction Multiple Data
  • the target data can be processed in batches, making full use of the columnar database's ability to quickly obtain a column of data, reducing the number of writing and outputting in expression calculations, and greatly reducing the time for querying data results.
  • it is friendly to the cache.
  • These optimizations can bring overall performance improvements to the entire expression.
  • the incoming filter mask in the process of processing that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be greatly reduced.
  • the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
  • the embodiment of the present invention combines the following at least one target model to detail the above step "based on the target model and the first bitmap data, process the data that meets the constraint condition to obtain the execution result corresponding to the constraint condition" Description.
  • this step may specifically include:
  • the first bitmap data and the third column data are determined as the execution result corresponding to the constraint condition.
  • the input of the first target model may include the first bitmap data, the expression, the first column of data, and the second column of data.
  • the output of the first target model may include the first bitmap data and the third column data.
  • the first target model is a summation model, as shown in Figure 3, a tree structure with an expression of a+b and its implementation model are shown.
  • a and b are any two columns in the target data, that is, the first column of data and the second column of data. Based on this, take the expression "+”, mask (bitmap data), a and b as input, add the data of a and b in each row, and then output the result.
  • the expression in the first target model receives a mask for filtering, a batch of data in column a, and a batch of data in column b.
  • the role of the mask is to filter out the rows that have been filtered (that is, rows that do not meet the constraint conditions; and/or, the rows that meet the marked preset conditions), and then add the remaining rows in batches, and finally Output a mask and the result a+b of this batch of data.
  • the input mask since the input mask is not changed (that is, the mask is not marked to hide 0, and the mask only includes visible lines, that is, marked 1), the input mask can be directly used as the output.
  • this step may specifically include:
  • the constraint condition is to select data larger than the constant data in the fourth column of data, based on the second target model, compare the fourth column of data with the same number of rows with the constant data to obtain the execution result corresponding to the constraint condition;
  • the execution result includes the second bitmap data
  • the second bitmap data includes the visible row corresponding to the row where the fourth column data with the same number of rows is greater than the constant data marked on the first bitmap data
  • the second bitmap The data has the same number of rows as the fourth column of data.
  • the input of the second target model may include first bitmap data, expressions, fourth column data, and constant data.
  • the output of the second target model may include the second bitmap data.
  • the tree structure of the fourth column of the expression a>constant data 3 and its implementation model are shown.
  • the expression in the second target model receives a batch of data in a mask and column a, and a batch of data with a data volume equal to the constant 3 of a as input.
  • the resulting output mask is based on the input mask. After using the mask to filter out some rows (that is, rows that do not meet the constraints), compare a>3 in batches.
  • this certain row is less than or equal to 3, reset these rows in the result mask (that is, mark the rows that do not meet the constraints as Hide row 0, mark the row that meets the constraint condition and the comparison result as visible row 1), and finally output a result mask (including the visible row 1 that satisfies the constraint condition and satisfies the comparison result).
  • the target data includes the first column of data and the second column of data
  • the target model includes the first model and the second model
  • the target data includes the first target data and the second column of data.
  • Two target data, the first target data includes the fifth column of data and the sixth column of data, and when the fifth column of data and the sixth column of data have the same number of rows, this step may specifically include:
  • the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, compare the seventh column of data with the same number of rows and the eighth column of data to obtain the execution result corresponding to the constraint condition ;
  • the execution result includes the third bitmap data
  • the third bitmap data includes the visible row corresponding to the row where the seventh column data with the same number of rows is greater than the eighth column data marked on the first bitmap data
  • the third The bitmap data has the same number of rows as the seventh column data.
  • a combined expression (a+b)>second target data c is shown.
  • This expression is a combination of (1) and (2).
  • the result of mask and (a+b) output in (1) can be used as the input of the expression (a+b)>c.
  • a batch of data of the same amount of data from c is taken as input, and finally (a+ b)>c operation, get a result mask.
  • the first bitmap data (mask) is filtered by using the incoming filter during processing and the result mask is input to the next node (this node can refer to the second
  • this node can refer to the second
  • the target model and/or other target nodes can speed up and greatly reduce the number of expression operations.
  • this step It can include:
  • the accumulated data and the first bitmap data are determined as the execution result corresponding to the constraint condition.
  • the input of the third target model may include the first bitmap data, the expression, and the ninth column of data.
  • the output of the third target model may include accumulated data and the first bitmap data.
  • FIG. 6 the tree structure of an aggregation query (Aggregation) expression SUM and its implementation model are shown.
  • This expression takes a mask for filtering and the ninth column of data a as input. After the row marked as 0 in the mask is used to filter out the row data corresponding to row 0 in the ninth column of data, the remaining data (ie It can be seen that the row corresponding to row 1) is summed, and finally the sum of a mask and this batch of data (sum(a)) is output.
  • this step It can include:
  • the visible line includes the first visible line and the second visible line
  • mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data including:
  • the first preset condition mark the data in the tenth column of data that has the same row as the first visible row, and adjust the first visible row to a hidden row;
  • the data in the tenth column of data that has the same row as the second visible row is marked until the third preset condition is met, the mark is ended, and the marked eleventh column of data is obtained.
  • the input of the fourth target model may include the first bitmap data (in this example, the first bitmap data includes hidden row 0 and visible row 1), expressions (carrying 3 constraints), and the first bitmap data. Ten columns of data.
  • the output of the fourth target model may include the first bitmap data and the eleventh column data.
  • conditional function CASE WHEN
  • This expression contains three jump branches, namely conditional judgments (when then, default then).
  • conditional judgments when then, default then.
  • the rows of the gray squares meet the condition, so those rows are set to condition result 5, and these rows are filtered out (that is, they are marked as hidden rows 0), and the rows are marked as hidden rows 0. It is not considered in a conditional judgment.
  • the rows where the black squares are located are filtered out first, and then the operation finds that the rows where the gray squares are located satisfy the second conditional judgment, so those rows are set to the condition result 6, and at the same time These rows are filtered out (that is, marked as hidden row 0) and will not be considered in the next conditional filter.
  • all the remaining rows are considered by default, and these rows are set to the condition result 7.
  • this step may specifically include:
  • the first bitmap data, expression, and target data determine the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression; wherein, the first expression or the second expression is the same as Any column of data in the target data is related;
  • the sixth bitmap data is determined as the execution result corresponding to the constraint condition.
  • the logic operation includes at least one of the following: an AND gate, an OR gate, and a NOT gate.
  • the input in the fifth target model may include the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression, where the fourth bitmap data and The method of the fifth bitmap data can be based on a result mask for input a output in (2) above, which can be understood as the fourth bitmap data mask corresponding to the first expression a; in the same way, if the above (2) is passed If the input is b, the output mask for the input b can be understood as the fifth bitmap data mask corresponding to the second expression a.
  • the tree structure of the expression containing AND and its implementation model are shown.
  • the expression in the fifth target model receives the result mask of the two sub-expressions as input (the mask output of expression a and the mask output of expression b), and the two input masks are ANDed to obtain the result mask and finally output .
  • the logical operation is an OR gate
  • the tree structure of the expression containing OR and its implementation model are shown.
  • the expression in the fifth target model receives the result mask of the two sub-expressions as input (the mask output of expression a and the mask output of expression b), and performs an OR operation on the two input masks to obtain the result mask and finally output.
  • the target model corresponding to the expression in the data query request is obtained, where the expression contains constraint conditions; the expression and the target data in the columnar database are input into the target model to obtain The execution result corresponding to the constraint condition.
  • the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, and reduces the number of writing or outputting in expression calculations, so that the time for obtaining data is greatly reduced, the amount of calculation is reduced, and the data processing is improved. Efficiency to improve the data query speed of SQL statements.
  • the number of expression operations can be greatly reduced.
  • the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
  • FIG. 10 shows a structural block diagram of a data processing device according to an embodiment.
  • the device 1000 may specifically include:
  • the obtaining module 1001 is used to obtain the target model corresponding to the expression in the data query request, and the expression includes constraint conditions;
  • the processing module 1002 is used to input expressions and target data in the columnar database into the target model to obtain execution results corresponding to the constraint conditions; wherein the target data includes all data in at least one column and part of data in at least one column;
  • the output module 1003 is used to output execution results.
  • the processing module 1002 may be specifically used to input the first bitmap data, expressions, and target data into the target model;
  • the data that meets the constraint conditions are processed, and the execution result corresponding to the constraint conditions is obtained.
  • the embodiment of the present invention provides six possible target models for detailed description.
  • the processing module 1002 can be specifically used to, based on the target model, set the first column of data with the same row One column of data and the second column of data are added to obtain the third column of data; among them, the third column of data has the same number of rows as the first column of data; the first bitmap data and the third column of data are determined to be the same as the constraint condition The corresponding execution result.
  • the processing module 1002 can be specifically used to select the fourth column of data greater than the constant data.
  • data based on the target model, compare the fourth column of data with the same number of rows with constant data to obtain the execution result corresponding to the constraint condition;
  • the execution result includes the second bitmap data
  • the second bitmap data includes the visible row corresponding to the row where the fourth column data with the same number of rows is greater than the constant data marked on the first bitmap data
  • the second bitmap The data has the same number of rows as the fourth column of data.
  • the target model includes the first model and the second model; the target data includes the first target data and the second target data, the first target data includes the fifth column of data and the sixth column of data, and the fifth column of data and the sixth column of data.
  • the processing module 1002 can be specifically used to add the fifth column data and the sixth column data with the same row based on the first model to obtain the seventh column data; among them, the seventh column The data has the same number of rows as the fifth column of data;
  • the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, compare the seventh column of data with the same number of rows and the eighth column of data to obtain the execution result corresponding to the constraint condition ;
  • the execution result includes the third bitmap data, the third bitmap data includes the visible row corresponding to the row where the seventh column data with the same number of rows is greater than the eighth column data marked on the first bitmap data, and the third The bitmap data has the same number of rows as the seventh column data.
  • the processing module 1002 can be specifically used to, based on the target
  • the model uses hidden rows to filter the hidden rows in the ninth column of data, and obtains the data in the ninth column of data that has the same row as the visible row; when the expression is data summation, the data with the same row in the visible row is accumulated, Obtain the accumulated data; determine the accumulated data and the first bitmap data as the execution result corresponding to the constraint condition.
  • the processing module 1002 can be specifically used to, based on the target The model uses hidden rows to filter the hidden rows in the tenth column of data to obtain the data in the tenth column of data that has the same row as the visible row; according to the first preset condition in the target model, compare the tenth column of data with the visible row The data with the same row is marked, and the marked eleventh column data is obtained; the first bitmap data and the eleventh column data are determined as the execution result corresponding to the constraint condition.
  • the processing module 1002 can be specifically configured to, according to the first preset condition, Mark the data in the tenth column of data that has the same row as the first visible row, and adjust the first visible row to a hidden row;
  • the data in the tenth column of data that has the same row as the second visible row is marked until the third preset condition is met, the mark is ended, and the marked eleventh column of data is obtained.
  • the processing module 1002 can be specifically used to determine the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression according to the first bitmap data, expression, and target data; where , The first expression or the second expression is related to any column of data in the target data; based on the target model, the fourth bitmap data and the fifth bitmap data are logically operated to obtain the sixth bitmap data; the sixth The bitmap data is determined as the execution result corresponding to the constraint condition.
  • the logic operation in the embodiment of the present invention includes at least one of the following: an AND gate, an OR gate, and a NOT gate.
  • processing module 1002 may be specifically used to input expressions and target data into the target model
  • the expression and target data are processed in parallel, and the execution result corresponding to the constraint condition is obtained.
  • the target model corresponding to the expression in the data query request is obtained, where the expression contains constraint conditions; the expression and the target data in the columnar database are input into the target model to obtain The execution result corresponding to the constraint condition.
  • the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, and reduces the number of writing or outputting in expression calculations, so that the time for obtaining data is greatly reduced, the amount of calculation is reduced, and the data processing is improved. Efficiency to improve the data query speed of SQL statements.
  • the number of expression operations can be greatly reduced.
  • the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
  • Fig. 11 shows a schematic structural diagram of a computing device according to an embodiment.
  • FIG. 11 a structural diagram of an exemplary hardware architecture of a computing device capable of implementing a data processing method and a data processing apparatus according to an embodiment of the present invention.
  • the device may include a processor 1101 and a memory 1102 storing computer program instructions.
  • the aforementioned processor 1101 may include a central processing unit (CPU), or an application specific integrated circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the memory 1102 may include mass storage for data or instructions.
  • the memory 1102 may include a hard disk drive (HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape or a universal serial bus (USB) drive or two and A combination of these.
  • the storage 1102 may include removable or non-removable (or fixed) media.
  • the memory 1102 may be inside or outside the integrated gateway device.
  • the memory 1102 is a non-volatile solid state memory.
  • the memory 1102 includes read-only memory (ROM).
  • the ROM can be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically rewritable ROM (EAROM) or flash memory, Or a combination of two or more of these.
  • the processor 1101 reads and executes computer program instructions stored in the memory 1102 to implement any data processing method in the foregoing embodiments.
  • the transceiver 1103 is mainly used to implement communication between various devices in the embodiment of the present invention or with other devices.
  • the device may also include a bus 1104.
  • the processor 1101, the memory 1102, and the transceiver 1103 are connected through a bus 1104 and complete mutual communication.
  • the bus 1104 includes hardware, software, or both.
  • the bus may include accelerated graphics port (AGP) or other graphics bus, enhanced industry standard architecture (EISA) bus, front side bus (FSB), hypertransport (HT) interconnect, industry standard architecture (ISA) Bus, unlimited bandwidth interconnect, low pin count (LPC) bus, memory bus, microchannel architecture (MCA) bus, peripheral control interconnect (PCI) bus, PCI-Express (PCI-X) bus, serial advanced technology Attachment (SATA) bus, Video Electronics Standards Association Local (VLB) bus or other suitable bus or a combination of two or more of these.
  • the bus 1103 may include one or more buses.
  • the embodiment of the present invention also provides a computer-readable storage medium corresponding to the foregoing data processing method.
  • the embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed in the computer, the computer is caused to perform the data processing involved in the embodiment of the present invention. method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data processing method, apparatus and device, and a storage medium, wherein the method may comprise: first, obtaining a target model corresponding to an expression in a data query request, the expression comprising a constraint condition (210); next, inputting the expression and target data in a column-oriented database into the target model to obtain an execution result corresponding to the constraint condition (220), wherein the target data comprise all data in at least one column and some data in at least one column; and then, outputting the execution result (230). The present invention is used for solving the problem in the related art of low data query speed of SQL statements due to low data processing efficiency.

Description

数据处理方法、装置、设备及存储介质Data processing method, device, equipment and storage medium
本申请要求2019年12月10日递交的申请号为201911259423.4、发明名称为“数据处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 201911259423.4 and the invention title "Data processing method, device, equipment and storage medium" filed on December 10, 2019, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本发明是涉及通信技术领域,尤其涉及一种数据处理方法、装置、设备及存储介质。The present invention relates to the field of communication technology, in particular to a data processing method, device, equipment and storage medium.
背景技术Background technique
在用户利用结构化查询语言(Structured Query Language,SQL)语句进行数据查询时,可以通过迭代器模型进行数据查询。在迭代器模型中,表达式以一个抽象的迭代器为输入,通过迭代器逐行地获取数据库中每个数据表的数据,从而计算每一行的结果,并将多行的结果输出作为输出,以获取对应的数据。When users use structured query language (Structured Query Language, SQL) statements to query data, they can query data through the iterator model. In the iterator model, the expression takes an abstract iterator as input, and obtains the data of each data table in the database row by row through the iterator, so as to calculate the result of each row, and output the result of multiple rows as output. To obtain the corresponding data.
但是,这种抽象的迭代器会带来性能上的损耗,因为在迭代器进行迭代的过程中,每一行数据的获取都会引发多层的函数调用,同时,逐行地获取数据会带来过多的写入和输出,消耗较多的资源,导致计算量过大,从而降低数据处理效率,并降低SQL语句的数据查询速度。However, this abstract iterator will bring performance loss, because in the iterative process of the iterator, the acquisition of each row of data will cause multiple layers of function calls, and at the same time, the acquisition of data row by row will bring about More writes and outputs consume more resources, resulting in excessive calculations, thereby reducing data processing efficiency and reducing the data query speed of SQL statements.
发明内容Summary of the invention
本发明一个或多个实施例描述了一种数据处理方法、装置、设备及存储介质,用以解决相关技术中,数据处理效率过低,影响SQL语句的数据查询速度慢的问题。One or more embodiments of the present invention describe a data processing method, device, device, and storage medium to solve the problem of low data processing efficiency in related technologies, which affects the slow data query speed of SQL statements.
为了解决上述技术问题,本发明是这样实现的:In order to solve the above technical problems, the present invention is implemented as follows:
根据第一方面,提供了一种数据处理方法,该方法可以包括:According to the first aspect, a data processing method is provided, and the method may include:
获取与数据查询请求中表达式对应的目标模型,表达式包含约束条件;Get the target model corresponding to the expression in the data query request, and the expression contains constraint conditions;
将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果;其中,目标数据包括至少一列的全部数据以及至少一列中的部分数据;Input the expression and the target data in the columnar database into the target model to obtain the execution result corresponding to the constraint condition; wherein the target data includes all the data in at least one column and part of the data in at least one column;
输出执行结果。Output the execution result.
根据第二方面,提供了一种数据处理装置,该装置可以包括:According to a second aspect, there is provided a data processing device, which may include:
获取模块,用于获取与数据查询请求中表达式对应的目标模型,表达式包含约束条件;The obtaining module is used to obtain the target model corresponding to the expression in the data query request, and the expression includes constraint conditions;
处理模块,用于将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果;其中,目标数据包括至少一列的全部数据以及至少一列中的部分数据;The processing module is used to input expressions and target data in the columnar database into the target model to obtain execution results corresponding to the constraint conditions; wherein the target data includes all data in at least one column and part of data in at least one column;
输出模块,用于输出执行结果。The output module is used to output the execution result.
根据第三方面,提供了一种计算设备,设备包括至少一个处理器和存储器,存储器用于存储有计算机程序指令,处理器用于执行存储器的程序,以控制计算设备实现如第一方面所示的数据处理方法。According to a third aspect, a computing device is provided. The device includes at least one processor and a memory, the memory is used to store computer program instructions, and the processor is used to execute a program in the memory to control the computing device to implement the method shown in the first aspect. Data processing method.
根据第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序,若计算机程序在计算机中执行,则令计算机执行如第一方面所示的数据处理方法。According to a fourth aspect, there is provided a computer-readable storage medium on which a computer program is stored. If the computer program is executed in a computer, the computer is caused to execute the data processing method shown in the first aspect.
本发明实施例中,通过获取与数据查询请求中表达式对应的目标模型,其中,表达式包含约束条件;将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果。这里,本发明实施例充分利用了列式数据库可以快速获取某一列数据的特性,在表达式运算中减少了写入或者输出的次数,使得获取数据的时间大大减少,减少计算量,提高数据处理效率,以提高SQL语句的数据查询速度。In the embodiment of the present invention, the target model corresponding to the expression in the data query request is obtained, where the expression contains the constraint condition; the expression and the target data in the columnar database are input into the target model to obtain the target model corresponding to the constraint condition The results of the implementation. Here, the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, reduces the number of writes or outputs in expression calculations, greatly reduces the time to obtain data, reduces the amount of calculation, and improves data processing. Efficiency to improve the data query speed of SQL statements.
附图说明Description of the drawings
从下面结合附图对本发明的具体实施方式的描述中可以更好地理解本发明其中,相同或相似的附图标记表示相同或相似的特征。The present invention can be better understood from the following description of the specific embodiments of the present invention in conjunction with the accompanying drawings, wherein the same or similar reference signs indicate the same or similar features.
图1示出根据一个实施例的数据处理方法的架构示意图;Fig. 1 shows a schematic structural diagram of a data processing method according to an embodiment;
图2示出根据一个实施例的一种数据处理方法的流程图;Fig. 2 shows a flowchart of a data processing method according to an embodiment;
图3示出根据一个实施例的一种第一目标模型实现数据处理方法的流程图;Fig. 3 shows a flowchart of a method for realizing data processing by a first target model according to an embodiment;
图4示出根据一个实施例的一种第二目标模型实现数据处理方法的流程图;Fig. 4 shows a flowchart of a method for realizing data processing by a second target model according to an embodiment;
图5示出根据一个实施例的一种第一目标模型和第二目标模型实现数据处理方法的流程图;Figure 5 shows a flow chart of a method for realizing data processing by a first target model and a second target model according to an embodiment;
图6示出根据一个实施例的一种第三目标模型实现数据处理方法的流程图;Fig. 6 shows a flow chart of a method for realizing data processing by a third target model according to an embodiment;
图7示出根据一个实施例的一种第四目标模型实现数据处理方法的流程图;Fig. 7 shows a flowchart of a method for realizing data processing by a fourth target model according to an embodiment;
图8示出根据一个实施例的一种第五目标模型实现数据处理方法的流程图;Fig. 8 shows a flow chart of a method for implementing data processing by a fifth target model according to an embodiment;
图9示出根据一个实施例的另一种第五目标模型实现数据处理方法的流程图;FIG. 9 shows a flowchart of another method for implementing data processing by a fifth target model according to an embodiment;
图10示出根据一个实施例的数据处理装置的结构框图;Fig. 10 shows a structural block diagram of a data processing device according to an embodiment;
图11示出根据一个实施例的计算设备的结构示意图。Fig. 11 shows a schematic structural diagram of a computing device according to an embodiment.
具体实施方式Detailed ways
下面将详细描述本发明的各个方面的特征和示例性实施例,为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细描述。应理解,此处所描述的具体实施例仅被配置为解释本发明,并不被配置为限定本发明。对于本领域技术人员来说,本发明可以在不需要这些具体细节中的一些细节的情况下实施。下面对实施例的描述仅仅是为了通过示出本发明的示例来提供对本发明更好的理解。The features and exemplary embodiments of each aspect of the present invention will be described in detail below. In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only configured to explain the present invention, but not configured to limit the present invention. For those skilled in the art, the present invention can be implemented without some of these specific details. The following description of the embodiments is only to provide a better understanding of the present invention by showing examples of the present invention.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种测量的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any relationship or sequence of such measurements. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the elements defined by the sentence "including..." do not exclude the existence of other identical elements in the process, method, article, or equipment that includes the elements.
目前,在迭代器模型中,表达式以一个抽象的迭代器为输入,以计算结果作为输出。在使用迭代器进行表达式计算的过程中,行式数据库中数据表的数据是逐行地被处理的。对于表中的每一行,可以使用迭代器获取这一行的数据,进行计算,然后写回结果。处理完一行数据后,迭代器会被移动到下一行,进行同样的计算。这种操作往往会形成一个循环,直到迭代器扫过了所有行,此时循环终止,表达式计算跳到下一个模型,然后在这个模型上进行类似的循环。由于迭代器对整张数据表进行了抽象,导致抽象的迭代器在获取数据的时候会进行多层的函数调用,降低整体的处理性能。另外,迭代器的每次移动都会涉及一次取数据的操作,这种操作往往涉及过多的写入和输出,带来非常大的开销,计算量过大,从而降低数据处理效率,并降低SQL语句的数据查询速度。At present, in the iterator model, expressions take an abstract iterator as input and calculation results as output. In the process of using iterators to calculate expressions, the data in the data table in the row database is processed row by row. For each row in the table, you can use an iterator to get the data of this row, perform calculations, and then write back the result. After processing one row of data, the iterator will be moved to the next row to perform the same calculation. This kind of operation often forms a loop until the iterator sweeps through all the rows, at which point the loop terminates, the expression calculation jumps to the next model, and then a similar loop is performed on this model. Since the iterator abstracts the entire data table, the abstract iterator will make multi-layer function calls when acquiring data, which reduces the overall processing performance. In addition, each movement of the iterator involves an operation of fetching data. This operation often involves too much writing and output, which brings very large overheads and excessive calculations, thereby reducing data processing efficiency and reducing SQL Data query speed of the statement.
为了解决上述技术问题,本发明实施例提供了一种数据处理方法装置、设备及存储介质,具体如下所示。In order to solve the above technical problems, embodiments of the present invention provide a data processing method device, device, and storage medium, which are specifically as follows.
首先,对本发明实施例提供的一种数据处理架构进行说明。First, a data processing architecture provided by an embodiment of the present invention will be described.
如图1所示,该架构可以包括目标节点和列式数据库。在目标节点接收到数据查询请求时,确定该数据查询请求中表达式对应的目标模型,表达式包含约束条件;接着,将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果;其中,目标数据包括至少一列的全部数据以及至少一列中的部分数据;输出执行结 果。As shown in Figure 1, the architecture can include target nodes and columnar databases. When the target node receives the data query request, it determines the target model corresponding to the expression in the data query request, and the expression contains constraint conditions; then, enters the expression and the target data in the columnar database into the target model, and obtains and The execution result corresponding to the constraint condition; wherein the target data includes all the data in at least one column and part of the data in at least one column; the execution result is output.
其中,在本发明实施例中的列式数据库可以包括Histore列式数据库。Histore列式数据库是基于开源数据库演进而来的列式数据库,该数据库中每个数据表的数据是按照列存储的,由此,在表达式计算的时候可以快速地获取某一列的所有行的数据。这使得其天然地具有对数据进行批(batch)处理的特性。因此,本发明实施例中新的基于批(batch)处理的目标模型利用这一特性,将数据表中的数据分成不同的批次,每一个表达式以批(batch)为单位处理数据,加快整个表达式的处理速度。Wherein, the columnar database in the embodiment of the present invention may include a Histore columnar database. Histore columnar database is a columnar database evolved from an open source database. The data of each data table in the database is stored in columns. Therefore, you can quickly get all rows of a column when calculating expressions. data. This makes it naturally have the characteristics of batch processing of data. Therefore, the new target model based on batch processing in the embodiment of the present invention uses this feature to divide the data in the data table into different batches, and each expression uses batch as the unit to process the data to speed up The processing speed of the entire expression.
另外,上述架构可以应用于用户利用SQL语句进行数据查询的应用场景中。或者,可以基于上述目标模型进行表达式计算的场景中。In addition, the above architecture can be applied to application scenarios where users use SQL statements to query data. Or, in a scenario where expression calculations are performed based on the aforementioned target model.
由此,通过获取与数据查询请求中表达式对应的目标模型;将表达式和列式数据库中的目标数据输入到目标模型中,得到与表达式的约束条件对应的执行结果。这里,本发明实施例充分利用了列式数据库可以快速获取某一列数据的特性,在表达式运算中减少了写入或者输出的次数,使得获取数据的时间大大减少,减少计算量,以提高数据处理效率。Thus, by obtaining the target model corresponding to the expression in the data query request; inputting the expression and target data in the columnar database into the target model, the execution result corresponding to the constraint condition of the expression is obtained. Here, the embodiment of the present invention makes full use of the feature of columnar database that can quickly obtain a certain column of data, reduces the number of writes or outputs in expression calculations, greatly reduces the time for obtaining data, reduces the amount of calculations, and improves data. Processing efficiency.
这里,基于上述架构和应用场景,本发明实施例结合图2-图9对本发明实施例提供的数据处理方法作出进一步说明。Here, based on the foregoing architecture and application scenarios, the embodiment of the present invention further illustrates the data processing method provided by the embodiment of the present invention with reference to FIGS. 2-9.
图2示出根据一个实施例的一种数据处理方法的流程图。Fig. 2 shows a flowchart of a data processing method according to an embodiment.
如图2所示,该方法可以包括步骤210至步骤230:首先,步骤210,获取与数据查询请求中表达式对应的目标模型,表达式包含约束条件;其次,步骤220,将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果;然后,步骤230,输出执行结果。As shown in Figure 2, the method may include steps 210 to 230: First, step 210, obtain the target model corresponding to the expression in the data query request, the expression contains constraints; second, step 220, combine the expression and column The target data in the formula database is input into the target model, and the execution result corresponding to the constraint condition is obtained; then, in step 230, the execution result is output.
下面分别对上述步骤进行详细说明:The above steps are described in detail below:
首先,涉及步骤210,目标节点接收到数据查询请求;First, referring to step 210, the target node receives a data query request;
其中,该数据查询请求包括表达式,该表达式可以表示为“统计某个商品的点击量”、“查询某个平台的访问量”等。该表达式可以包括至少一个约束条件,由此,上述表达式“统计某个商品的点击量”可以包括的如下约束条件“统计某个商品在1月到6月的点击量”、“统计某个商品的点击量在1-1000之内的数据”等。Wherein, the data query request includes an expression, and the expression can be expressed as "to count the number of clicks on a certain product", "to query the number of visits to a certain platform", and so on. The expression may include at least one constraint condition. Therefore, the above expression "statistics of clicks on a certain product" may include the following constraints: "statistics of clicks on a certain product from January to June" and "statistics of clicks on a certain product" "Data that the number of clicks for a product is within 1-1000" and so on.
在接收到数据查询请求之后,本发明实施例提供的方法还可以包括:根据数据查询请求,确定与数据查询请求中表达式对应的目标模型。After receiving the data query request, the method provided by the embodiment of the present invention may further include: determining the target model corresponding to the expression in the data query request according to the data query request.
这里,每个表达式都有其对应的目标模型,在本发明实施例中提供了6种目标模型, 下面结合步骤220进行详细描述。Here, each expression has its corresponding target model. In the embodiment of the present invention, six target models are provided, which will be described in detail below in conjunction with step 220.
其次,涉及步骤220,本发明实施例中的目标数据包括至少一列的全部数据以及至少一列中的部分数据。另外,列式数据库中的数据为按列存储,每一列单独存放至少一种属性的数据。Secondly, referring to step 220, the target data in the embodiment of the present invention includes all the data in at least one column and part of the data in at least one column. In addition, the data in the columnar database is stored in columns, and each column stores data of at least one attribute separately.
基于此,在一种可能的实施例中,该步骤具体可以包括:将表达式和目标数据输入到目标模型中;Based on this, in a possible embodiment, this step may specifically include: inputting the expression and target data into the target model;
基于目标模型,利用单指令多数据流(Single Instruction Multiple Data,SIMD)指令集,对表达式以及目标数据进行并行化处理,得到与约束条件对应的执行结果。Based on the target model, using the Single Instruction Multiple Data (SIMD) instruction set, the expression and target data are parallelized to obtain the execution result corresponding to the constraint condition.
由此,通过按批计算的方式,可以充分利用了SIMD指令集进行表达式运算,加快了运算速度。As a result, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
在另一种可能的实施例中,该步骤具体可以包括将第一位图数据、表达式和目标数据输入到目标模型中;In another possible embodiment, this step may specifically include inputting the first bitmap data, expression, and target data into the target model;
利用第一位图数据过滤目标数据,得到满足约束条件的数据;Use the first bitmap data to filter the target data to obtain data that meets the constraint conditions;
基于目标模型和第一位图数据,对满足约束条件的数据进行处理,得到与约束条件对应的执行结果。Based on the target model and the first bitmap data, the data that meets the constraint conditions are processed, and the execution result corresponding to the constraint conditions is obtained.
由此,通过在处理的过程中使用传入的过滤第一位图数据(mask),即优先过滤掉一些未满足约束条件的数据,可以减少表达式运算的数量。Therefore, by using the incoming filtered first bitmap data (mask) in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be reduced.
需要提示的是,上述第二种可能的实施例,可以基于第一种可能的实施例实现,即将第一位图数据、表达式和目标数据输入到目标模型中;What needs to be reminded is that the above-mentioned second possible embodiment can be implemented based on the first possible embodiment, that is, the first bitmap data, expression, and target data are input into the target model;
利用第一位图数据过滤目标数据,得到满足约束条件的数据;Use the first bitmap data to filter the target data to obtain data that meets the constraint conditions;
基于目标模型和第一位图数据,利用单指令多数据流(Single Instruction Multiple Data,SIMD)指令集,对表达式以及目标数据进行并行化处理,得到与约束条件对应的执行结果。Based on the target model and the first bitmap data, using the Single Instruction Multiple Data (SIMD) instruction set, the expression and target data are parallelized to obtain the execution result corresponding to the constraint condition.
这样,可以通过对目标数据进行分批处理,充分利用了列式数据库可以快速获取某一列数据的特性,在表达式运算中减少了写入和输出的次数,使得查询数据结果的时间大大减少,同时对缓存友好。这些优化都可以为整个表达式带来整体的性能提升。另外,通过在处理的过程中使用传入的过滤mask即优先过滤掉一些未满足约束条件的数据,可以加快大量减少表达式运算的数量。同时,通过按批计算的方式,可以充分利用SIMD指令集进行表达式运算,加快了运算速度。In this way, the target data can be processed in batches, making full use of the columnar database's ability to quickly obtain a column of data, reducing the number of writing and outputting in expression calculations, and greatly reducing the time for querying data results. At the same time, it is friendly to the cache. These optimizations can bring overall performance improvements to the entire expression. In addition, by using the incoming filter mask in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be greatly reduced. At the same time, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
基于上述内容,本发明实施例结合下述至少一个目标模型,对上述步骤“基于目标 模型和第一位图数据,对满足约束条件的数据进行处理,得到与约束条件对应的执行结果”进行详细说明。Based on the above content, the embodiment of the present invention combines the following at least one target model to detail the above step "based on the target model and the first bitmap data, process the data that meets the constraint condition to obtain the execution result corresponding to the constraint condition" Description.
(1)在目标模型为第一目标模型,目标数据包括第一列数据和第二列数据,第一列数据和第二列数据具有相同行数时,该步骤具体可以包括:(1) When the target model is the first target model, the target data includes the first column of data and the second column of data, and the first column of data and the second column of data have the same number of rows, this step may specifically include:
基于第一目标模型,将具有相同行的第一列数据和第二列数据进行相加,得到第三列数据;其中,第三列数据与第一列数据具有相同行数;Based on the first target model, add the data in the first column and the data in the second column with the same row to obtain the data in the third column; wherein the data in the third column and the data in the first column have the same number of rows;
将第一位图数据和第三列数据确定为与约束条件对应的执行结果。The first bitmap data and the third column data are determined as the execution result corresponding to the constraint condition.
其中,该第一目标模型的输入可以包括第一位图数据、表达式、第一列数据和第二列数据。通过上述数据处理方式,第一目标模型的输出可以包括第一位图数据和第三列数据。Wherein, the input of the first target model may include the first bitmap data, the expression, the first column of data, and the second column of data. Through the above data processing method, the output of the first target model may include the first bitmap data and the third column data.
举例说明,在第一目标模型为求和模型时,如图3所示,展示了一个表达式为a+b的树形结构及其实现模型。其中,a和b是目标数据中任意两个列即第一列数据和第二列数据。基于此,将表达式“+”、mask(位图数据)、a和b作为输入,对每一行的a和b的数据做加法,然后将结果输出。在基于批处理的第一目标模型中,第一目标模型中的表达式接收一个用于过滤的mask、列a的一批数据,列b的一批数据。mask的作用是过滤掉已经被过滤的行(即可以理解为未满足约束条件的行;和/或,满足已被标记的预设条件的行),然后对其余的行批量地进行加法,最后输出一个mask和这一批数据的结果a+b。在这个表达式中,由于不改动输入的mask(即没有对mask进行标记隐藏0,该mask仅包括可见行即标记1),因此输入的mask可以直接作为输出。For example, when the first target model is a summation model, as shown in Figure 3, a tree structure with an expression of a+b and its implementation model are shown. Among them, a and b are any two columns in the target data, that is, the first column of data and the second column of data. Based on this, take the expression "+", mask (bitmap data), a and b as input, add the data of a and b in each row, and then output the result. In the first target model based on batch processing, the expression in the first target model receives a mask for filtering, a batch of data in column a, and a batch of data in column b. The role of the mask is to filter out the rows that have been filtered (that is, rows that do not meet the constraint conditions; and/or, the rows that meet the marked preset conditions), and then add the remaining rows in batches, and finally Output a mask and the result a+b of this batch of data. In this expression, since the input mask is not changed (that is, the mask is not marked to hide 0, and the mask only includes visible lines, that is, marked 1), the input mask can be directly used as the output.
(2)在目标模型为第二目标模型,目标数据包括第四列数据和常量数据,第四列数据和常量数据具有相同行数时,该步骤具体可以包括:(2) When the target model is the second target model, and the target data includes the fourth column of data and constant data, and the fourth column of data and constant data have the same number of rows, this step may specifically include:
在约束条件为选取第四列数据中大于常量数据的数据时,基于第二目标模型,将具有相同行数的第四列数据和常量数据进行比较,得到与约束条件对应的执行结果;When the constraint condition is to select data larger than the constant data in the fourth column of data, based on the second target model, compare the fourth column of data with the same number of rows with the constant data to obtain the execution result corresponding to the constraint condition;
其中,执行结果包括第二位图数据,第二位图数据包括在第一位图数据上标记具有相同行数的第四列数据大于常量数据的数据所在行对应的可见行,第二位图数据与第四列数据具有相同的行数。Among them, the execution result includes the second bitmap data, the second bitmap data includes the visible row corresponding to the row where the fourth column data with the same number of rows is greater than the constant data marked on the first bitmap data, and the second bitmap The data has the same number of rows as the fourth column of data.
其中,该第二目标模型的输入可以包括第一位图数据、表达式、第四列数据和常量数据。通过上述数据处理方式,第二目标模型的输出可以包括第二位图数据。The input of the second target model may include first bitmap data, expressions, fourth column data, and constant data. Through the above data processing method, the output of the second target model may include the second bitmap data.
举例说明,如图4所示,展示了表达式第四列数据a>常量数据3的树形结构及其实现模型。基于此,在基于批处理的第二目标模型中,第二目标模型中的表达式接收一个 mask、列a的一批数据,一批数据量等同于a的常量3的一批数据作为输入。结果输出的mask以输入的mask为基础。使用mask过滤掉一些行后(即未满足约束条件的行),批量地比较a>3,如果这个某一行小于等于3则在结果mask中重置这些行(即将未满足约束条件的行标记为隐藏行0,将满足约束条件且满足比较结果的行标记为可见行1),最终输出一个结果mask(包括满足约束条件且满足比较结果的可见行1)。For example, as shown in Figure 4, the tree structure of the fourth column of the expression a>constant data 3 and its implementation model are shown. Based on this, in the second target model based on batch processing, the expression in the second target model receives a batch of data in a mask and column a, and a batch of data with a data volume equal to the constant 3 of a as input. The resulting output mask is based on the input mask. After using the mask to filter out some rows (that is, rows that do not meet the constraints), compare a>3 in batches. If this certain row is less than or equal to 3, reset these rows in the result mask (that is, mark the rows that do not meet the constraints as Hide row 0, mark the row that meets the constraint condition and the comparison result as visible row 1), and finally output a result mask (including the visible row 1 that satisfies the constraint condition and satisfies the comparison result).
(3)在目标模型为第一目标模型和第二目标模型,目标数据包括第一列数据和第二列数据,目标模型包括第一模型和第二模型;目标数据包括第一目标数据和第二目标数据,第一目标数据包括第五列数据和第六列数据,第五列数据和第六列数据具有相同行数时,该步骤具体可以包括:(3) When the target models are the first target model and the second target model, the target data includes the first column of data and the second column of data, the target model includes the first model and the second model; the target data includes the first target data and the second column of data. Two target data, the first target data includes the fifth column of data and the sixth column of data, and when the fifth column of data and the sixth column of data have the same number of rows, this step may specifically include:
基于第一模型,将具有相同行的第五列数据和第六列数据进行相加,得到第七列数据;其中,第七列数据与第五列数据具有相同行数;Based on the first model, add the data in the fifth column and the data in the sixth column with the same row to obtain the data in the seventh column; wherein the data in the seventh column and the data in the fifth column have the same number of rows;
将第一位图数据、第七列数据和第二目标数据输入到第二模型中;其中,第二目标数据包括第八列数据,第八列数据和七列数据具有相同行数;Input the first bitmap data, the seventh column data, and the second target data into the second model; wherein, the second target data includes the eighth column of data, and the eighth column of data and the seventh column of data have the same number of rows;
在约束条件为选取第七列数据中大于第八列数据的数据时,基于第二模型,将具有相同行数的第七列数据和第八列数据进行比较,得到与约束条件对应的执行结果;When the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, compare the seventh column of data with the same number of rows and the eighth column of data to obtain the execution result corresponding to the constraint condition ;
其中,执行结果包括第三位图数据,第三位图数据包括在第一位图数据上标记具有相同行数的第七列数据大于第八列数据的数据所在行对应的可见行,第三位图数据与第七列数据具有相同的行数。Wherein, the execution result includes the third bitmap data, the third bitmap data includes the visible row corresponding to the row where the seventh column data with the same number of rows is greater than the eighth column data marked on the first bitmap data, and the third The bitmap data has the same number of rows as the seventh column data.
举例说明,如图5所示,展示了一个组合表达式(a+b)>第二目标数据c。这个表达式是由(1)和(2)组合而成。(1)中输出的mask和(a+b)结果可以作为表达式(a+b)>c的输入,另外从c中取同等数据量的一批数据作为输入,最终批量地进行(a+b)>c运算,得到一个结果mask。For example, as shown in Figure 5, a combined expression (a+b)>second target data c is shown. This expression is a combination of (1) and (2). The result of mask and (a+b) output in (1) can be used as the input of the expression (a+b)>c. In addition, a batch of data of the same amount of data from c is taken as input, and finally (a+ b)>c operation, get a result mask.
由此,基于第一目标模型和第二目标模型,通过在处理的过程中使用传入的过滤第一位图数据(mask)并将结果mask输入到下一个节点(该节点可以指代第二目标模型和/或其他的目标节点),可以加快大量减少表达式运算的数量。Therefore, based on the first target model and the second target model, the first bitmap data (mask) is filtered by using the incoming filter during processing and the result mask is input to the next node (this node can refer to the second The target model and/or other target nodes) can speed up and greatly reduce the number of expression operations.
(4)在目标模型为第三目标模型,目标数据包括第九列数据,第一位图数据包括隐藏行和可见行,第一位图数据和第九列数据具有相同行数时,该步骤具体可以包括:(4) When the target model is the third target model, the target data includes the ninth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the ninth column of data have the same number of rows, this step It can include:
基于目标模型,利用隐藏行过滤第九列数据中的隐藏行,得到第九列数据中与可见行具有相同行的数据;Based on the target model, use hidden rows to filter hidden rows in the ninth column of data, and obtain data in the ninth column of data that has the same row as the visible row;
在表达式为数据求和时,将可见行具有相同行的数据进行累加,得到累加数据;When the expression is the sum of data, the data of the same row in the visible rows are accumulated to obtain the accumulated data;
将累加数据和第一位图数据确定为与约束条件对应的执行结果。The accumulated data and the first bitmap data are determined as the execution result corresponding to the constraint condition.
其中,该第三目标模型的输入可以包括第一位图数据、表达式和第九列数据。通过上述数据处理方式,第三目标模型的输出可以包括累加数据和第一位图数据。Wherein, the input of the third target model may include the first bitmap data, the expression, and the ninth column of data. Through the above-mentioned data processing method, the output of the third target model may include accumulated data and the first bitmap data.
举例说明,如图6所示,展示了一个聚合查询(Aggregation)表达式SUM的树形结构及其实现模型。这个表达式以一个用于过滤的mask以及第九列数据a作为输入,在使用mask中标记为0的行过滤掉第九列数据中与0行对应的行数据之后,对剩余的数据(即可见行1对应的行)求和,最终输出一个mask和这批数据的和(sum(a))。For example, as shown in Figure 6, the tree structure of an aggregation query (Aggregation) expression SUM and its implementation model are shown. This expression takes a mask for filtering and the ninth column of data a as input. After the row marked as 0 in the mask is used to filter out the row data corresponding to row 0 in the ninth column of data, the remaining data (ie It can be seen that the row corresponding to row 1) is summed, and finally the sum of a mask and this batch of data (sum(a)) is output.
(5)在目标模型为第四目标模型,目标数据包括第十列数据,第一位图数据包括隐藏行和可见行,第一位图数据和第十列数据具有相同行数时,该步骤具体可以包括:(5) When the target model is the fourth target model, the target data includes the tenth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the tenth column of data have the same number of rows, this step It can include:
基于目标模型,利用隐藏行过滤第十列数据中的隐藏行,得到第十列数据中与可见行具有相同行的数据;Based on the target model, use hidden rows to filter hidden rows in the tenth column of data, and obtain data in the tenth column of data that have the same rows as the visible rows;
根据目标模型中的第一预设条件,对第十列数据中与可见行具有相同行的数据进行标记,得到被标记后的第十一列数据;According to the first preset condition in the target model, mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data;
将第一位图数据和第十一列数据确定为约束条件对应的执行结果。Determine the first bitmap data and the eleventh column of data as the execution result corresponding to the constraint condition.
进一步地,在目标模型中包括第一预设条件和第二预设条件时;可见行包括第一可见行和第二可见行;Further, when the first preset condition and the second preset condition are included in the target model; the visible line includes the first visible line and the second visible line;
根据目标模型中的第一预设条件,对第十列数据中与可见行具有相同行的数据进行标记,得到被标记后的第十一列数据,包括:According to the first preset condition in the target model, mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data, including:
根据第一预设条件,对第十列数据中与第一可见行具有相同行的数据进行标记,且将第一可见行调整为隐藏行;According to the first preset condition, mark the data in the tenth column of data that has the same row as the first visible row, and adjust the first visible row to a hidden row;
根据第二预设条件,对第十列数据中与第二可见行具有相同行的数据进行标记,直至满足第三预设条件,结束标记,得到被标记后的第十一列数据。According to the second preset condition, the data in the tenth column of data that has the same row as the second visible row is marked until the third preset condition is met, the mark is ended, and the marked eleventh column of data is obtained.
其中,该第四目标模型的输入可以包括第一位图数据(在这个例子中,该第一位图数据包括隐藏行0和可见行1)、表达式(携带有3条约束条件)和第十列数据。通过上述数据处理方式,第四目标模型的输出可以包括第一位图数据和第十一列数据。The input of the fourth target model may include the first bitmap data (in this example, the first bitmap data includes hidden row 0 and visible row 1), expressions (carrying 3 constraints), and the first bitmap data. Ten columns of data. Through the above data processing method, the output of the fourth target model may include the first bitmap data and the eleventh column data.
举例说明,如图7所示,展示了条件函数(CASE WHEN)表达式的树形结构及其实现模型。这个表达式包含3个跳转分支即条件判断(when then,default then)。在初始情况下,使用mask过滤掉一部分数据(黑色方格所示的行即隐藏行0)。在第一个条件判断(when a)的结果中,灰色方格所在的行满足条件,因此那些行被置上条件结果5,同时这些行被过滤掉(即被标记成隐藏行0),在下一个条件判断中不予考虑。在第二 个条件判断(when b)中,黑色方格所在的行被首先过滤掉,然后运算发现灰色方格所在的行满足第二个条件判断,因此那些行被置上条件结果6,同时这些行被过滤掉(即被标记成隐藏行0),在下一个条件过滤中不予考虑。在最后一个条件判断中,默认考虑剩下的所有行(白色方格所在的行),这些行被置上条件结果7。最终这个表达式输出一个mask和所有满足约束条件的条件结果,其中,default表示所有条件都不符合时候的选择。For example, as shown in Figure 7, the tree structure of the conditional function (CASE WHEN) expression and its implementation model are shown. This expression contains three jump branches, namely conditional judgments (when then, default then). In the initial case, use the mask to filter out a part of the data (the row shown in the black square is the hidden row 0). In the result of the first conditional judgment (when a), the rows of the gray squares meet the condition, so those rows are set to condition result 5, and these rows are filtered out (that is, they are marked as hidden rows 0), and the rows are marked as hidden rows 0. It is not considered in a conditional judgment. In the second conditional judgment (when b), the rows where the black squares are located are filtered out first, and then the operation finds that the rows where the gray squares are located satisfy the second conditional judgment, so those rows are set to the condition result 6, and at the same time These rows are filtered out (that is, marked as hidden row 0) and will not be considered in the next conditional filter. In the last condition judgment, all the remaining rows (the rows with the white squares) are considered by default, and these rows are set to the condition result 7. Finally, this expression outputs a mask and all conditional results that meet the constraints, where default represents the choice when all the conditions do not meet.
(6)在目标模型为第五目标模型时,该步骤具体可以包括:(6) When the target model is the fifth target model, this step may specifically include:
根据第一位图数据、表达式和目标数据,确定第一表达式对应的第四位图数据和第二表达式对应的第五位图数据;其中,第一表达式或者第二表达式与目标数据中的任意一列数据相关;According to the first bitmap data, expression, and target data, determine the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression; wherein, the first expression or the second expression is the same as Any column of data in the target data is related;
基于目标模型,对第四位图数据和第五位图数据进行逻辑运算,得到第六位图数据;Based on the target model, perform logical operations on the fourth bitmap data and the fifth bitmap data to obtain the sixth bitmap data;
将第六位图数据确定为与约束条件对应的执行结果。The sixth bitmap data is determined as the execution result corresponding to the constraint condition.
其中,逻辑运算包括下述中的至少一种:与门、或门、非门。Wherein, the logic operation includes at least one of the following: an AND gate, an OR gate, and a NOT gate.
这里,需要提示的是,该第五目标模型中的输入可以包括第一表达式对应的第四位图数据和第二表达式对应的第五位图数据,其中,确定第四位图数据和第五位图数据的方式可以基于上述(2)中输出的一个针对输入a的结果mask即可理解为第一表达式a对应的第四位图数据mask;同理,若通过上述(2)输入的是b,则输出的一个针对输入b结果mask即可理解为第二表达式a对应的第五位图数据mask。Here, what needs to be reminded is that the input in the fifth target model may include the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression, where the fourth bitmap data and The method of the fifth bitmap data can be based on a result mask for input a output in (2) above, which can be understood as the fourth bitmap data mask corresponding to the first expression a; in the same way, if the above (2) is passed If the input is b, the output mask for the input b can be understood as the fifth bitmap data mask corresponding to the second expression a.
基于此,在逻辑运算为与门时,如图8所示,展示了包含AND的表达式的树形结构及其实现模型。第五目标模型中的表达式接收两个子表达式的结果mask作为输入(表达式a的mask输出和表达式b的mask输出),对两个输入的mask作与操作,得到结果mask并最后输出。Based on this, when the logical operation is an AND gate, as shown in Figure 8, the tree structure of the expression containing AND and its implementation model are shown. The expression in the fifth target model receives the result mask of the two sub-expressions as input (the mask output of expression a and the mask output of expression b), and the two input masks are ANDed to obtain the result mask and finally output .
在逻辑运算为或门时,如图9所示,展示了包含OR的表达式的树形结构及其实现模型。第五目标模型中的表达式接收两个子表达式的结果mask作为输入(表达式a的mask输出和表达式b的mask输出),对两个输入的mask做或操作,得到结果mask最后输出。When the logical operation is an OR gate, as shown in Figure 9, the tree structure of the expression containing OR and its implementation model are shown. The expression in the fifth target model receives the result mask of the two sub-expressions as input (the mask output of expression a and the mask output of expression b), and performs an OR operation on the two input masks to obtain the result mask and finally output.
由此,基于第二目标模型和第五目标模型,通过在处理的过程中使用传入的过滤第一位图数据(mask)并将结果mask输入到下一个节点,可以加快大量减少表达式运算的数量。Therefore, based on the second target model and the fifth target model, by using the incoming filter first bitmap data (mask) in the process of processing and inputting the result mask to the next node, it can speed up a large number of expression calculations. quantity.
综上,本发明实施例中,通过获取与数据查询请求中表达式对应的目标模型,其中, 表达式包含约束条件;将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果。这里,本发明实施例充分利用了列式数据库可以快速获取某一列数据的特性,在表达式运算中减少了写入或者输出的次数,使得获取数据的时间大大减少,减少计算量,提高数据处理效率,以提高SQL语句的数据查询速度。To sum up, in the embodiment of the present invention, the target model corresponding to the expression in the data query request is obtained, where the expression contains constraint conditions; the expression and the target data in the columnar database are input into the target model to obtain The execution result corresponding to the constraint condition. Here, the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, and reduces the number of writing or outputting in expression calculations, so that the time for obtaining data is greatly reduced, the amount of calculation is reduced, and the data processing is improved. Efficiency to improve the data query speed of SQL statements.
另外,通过在处理的过程中使用传入的过滤mask即优先过滤掉一些未满足约束条件的数据,可以加快大量减少表达式运算的数量。同时,通过按批计算的方式,可以充分利用SIMD指令集进行表达式运算,加快了运算速度。In addition, by using the incoming filter mask in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be greatly reduced. At the same time, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
基于上述数据处理的方法,本发明实施例提供了一种数据处理装置。图10示出根据一个实施例的数据处理装置的结构框图。Based on the foregoing data processing method, an embodiment of the present invention provides a data processing device. Fig. 10 shows a structural block diagram of a data processing device according to an embodiment.
如图10所示,该装置1000具体可以包括:As shown in FIG. 10, the device 1000 may specifically include:
获取模块1001,用于获取与数据查询请求中表达式对应的目标模型,表达式包含约束条件;The obtaining module 1001 is used to obtain the target model corresponding to the expression in the data query request, and the expression includes constraint conditions;
处理模块1002,用于将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果;其中,目标数据包括至少一列的全部数据以及至少一列中的部分数据;The processing module 1002 is used to input expressions and target data in the columnar database into the target model to obtain execution results corresponding to the constraint conditions; wherein the target data includes all data in at least one column and part of data in at least one column;
输出模块1003,用于输出执行结果。The output module 1003 is used to output execution results.
在一种可能的实施例中,处理模块1002具体可以用于,将第一位图数据、表达式和目标数据输入到目标模型中;In a possible embodiment, the processing module 1002 may be specifically used to input the first bitmap data, expressions, and target data into the target model;
利用第一位图数据过滤目标数据,得到满足约束条件的数据;Use the first bitmap data to filter the target data to obtain data that meets the constraint conditions;
基于目标模型和第一位图数据,对满足约束条件的数据进行处理,得到与约束条件对应的执行结果。Based on the target model and the first bitmap data, the data that meets the constraint conditions are processed, and the execution result corresponding to the constraint conditions is obtained.
进一步地,本发明实施例提供了6种可能的目标模型进行详细说明。Further, the embodiment of the present invention provides six possible target models for detailed description.
(1)在目标数据包括第一列数据和第二列数据,第一列数据和第二列数据具有相同行数时,处理模块1002具体可以用于,基于目标模型,将具有相同行的第一列数据和第二列数据进行相加,得到第三列数据;其中,第三列数据与第一列数据具有相同行数;将第一位图数据和第三列数据确定为与约束条件对应的执行结果。(1) When the target data includes the first column of data and the second column of data, and the first column of data and the second column of data have the same number of rows, the processing module 1002 can be specifically used to, based on the target model, set the first column of data with the same row One column of data and the second column of data are added to obtain the third column of data; among them, the third column of data has the same number of rows as the first column of data; the first bitmap data and the third column of data are determined to be the same as the constraint condition The corresponding execution result.
(2)在目标数据包括第四列数据和常量数据,第四列数据和常量数据具有相同行数时,处理模块1002具体可以用于,在约束条件为选取第四列数据中大于常量数据的数据时,基于目标模型,将具有相同行数的第四列数据和常量数据进行比较,得到与约束条件对应的执行结果;(2) When the target data includes the fourth column of data and constant data, and the fourth column of data and constant data have the same number of rows, the processing module 1002 can be specifically used to select the fourth column of data greater than the constant data. When data, based on the target model, compare the fourth column of data with the same number of rows with constant data to obtain the execution result corresponding to the constraint condition;
其中,执行结果包括第二位图数据,第二位图数据包括在第一位图数据上标记具有相同行数的第四列数据大于常量数据的数据所在行对应的可见行,第二位图数据与第四列数据具有相同的行数。Among them, the execution result includes the second bitmap data, the second bitmap data includes the visible row corresponding to the row where the fourth column data with the same number of rows is greater than the constant data marked on the first bitmap data, and the second bitmap The data has the same number of rows as the fourth column of data.
(3)在目标模型包括第一模型和第二模型;目标数据包括第一目标数据和第二目标数据,第一目标数据包括第五列数据和第六列数据,第五列数据和第六列数据具有相同行数时,处理模块1002具体可以用于,基于第一模型,将具有相同行的第五列数据和第六列数据进行相加,得到第七列数据;其中,第七列数据与第五列数据具有相同行数;(3) The target model includes the first model and the second model; the target data includes the first target data and the second target data, the first target data includes the fifth column of data and the sixth column of data, and the fifth column of data and the sixth column of data. When the column data has the same number of rows, the processing module 1002 can be specifically used to add the fifth column data and the sixth column data with the same row based on the first model to obtain the seventh column data; among them, the seventh column The data has the same number of rows as the fifth column of data;
将第一位图数据、第七列数据和第二目标数据输入到第二模型中;其中,第二目标数据包括第八列数据,第八列数据和七列数据具有相同行数;Input the first bitmap data, the seventh column data, and the second target data into the second model; wherein, the second target data includes the eighth column of data, and the eighth column of data and the seventh column of data have the same number of rows;
在约束条件为选取第七列数据中大于第八列数据的数据时,基于第二模型,将具有相同行数的第七列数据和第八列数据进行比较,得到与约束条件对应的执行结果;When the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, compare the seventh column of data with the same number of rows and the eighth column of data to obtain the execution result corresponding to the constraint condition ;
其中,执行结果包括第三位图数据,第三位图数据包括在第一位图数据上标记具有相同行数的第七列数据大于第八列数据的数据所在行对应的可见行,第三位图数据与第七列数据具有相同的行数。The execution result includes the third bitmap data, the third bitmap data includes the visible row corresponding to the row where the seventh column data with the same number of rows is greater than the eighth column data marked on the first bitmap data, and the third The bitmap data has the same number of rows as the seventh column data.
(4)在目标数据包括第九列数据,第一位图数据包括隐藏行和可见行,第一位图数据和第九列数据具有相同行数时,处理模块1002具体可以用于,基于目标模型,利用隐藏行过滤第九列数据中的隐藏行,得到第九列数据中与可见行具有相同行的数据;在表达式为数据求和时,将可见行具有相同行的数据进行累加,得到累加数据;将累加数据和第一位图数据确定为与约束条件对应的执行结果。(4) When the target data includes the ninth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the ninth column of data have the same number of rows, the processing module 1002 can be specifically used to, based on the target The model uses hidden rows to filter the hidden rows in the ninth column of data, and obtains the data in the ninth column of data that has the same row as the visible row; when the expression is data summation, the data with the same row in the visible row is accumulated, Obtain the accumulated data; determine the accumulated data and the first bitmap data as the execution result corresponding to the constraint condition.
(5)在目标数据包括第十列数据,第一位图数据包括隐藏行和可见行,第一位图数据和第十列数据具有相同行数时,处理模块1002具体可以用于,基于目标模型,利用隐藏行过滤第十列数据中的隐藏行,得到第十列数据中与可见行具有相同行的数据;根据目标模型中的第一预设条件,对第十列数据中与可见行具有相同行的数据进行标记,得到被标记后的第十一列数据;将第一位图数据和第十一列数据确定为约束条件对应的执行结果。(5) When the target data includes the tenth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the tenth column of data have the same number of rows, the processing module 1002 can be specifically used to, based on the target The model uses hidden rows to filter the hidden rows in the tenth column of data to obtain the data in the tenth column of data that has the same row as the visible row; according to the first preset condition in the target model, compare the tenth column of data with the visible row The data with the same row is marked, and the marked eleventh column data is obtained; the first bitmap data and the eleventh column data are determined as the execution result corresponding to the constraint condition.
进一步地,在目标模型中包括第一预设条件和第二预设条件时;可见行包括第一可见行和第二可见行时,处理模块1002具体可以用于,根据第一预设条件,对第十列数据中与第一可见行具有相同行的数据进行标记,且将第一可见行调整为隐藏行;Further, when the target model includes the first preset condition and the second preset condition; when the visible line includes the first visible line and the second visible line, the processing module 1002 can be specifically configured to, according to the first preset condition, Mark the data in the tenth column of data that has the same row as the first visible row, and adjust the first visible row to a hidden row;
根据第二预设条件,对第十列数据中与第二可见行具有相同行的数据进行标记,直至满足第三预设条件,结束标记,得到被标记后的第十一列数据。According to the second preset condition, the data in the tenth column of data that has the same row as the second visible row is marked until the third preset condition is met, the mark is ended, and the marked eleventh column of data is obtained.
(6)处理模块1002具体可以用于,根据第一位图数据、表达式和目标数据,确定第一表达式对应的第四位图数据和第二表达式对应的第五位图数据;其中,第一表达式或者第二表达式与目标数据中的任意一列数据相关;基于目标模型,对第四位图数据和第五位图数据进行逻辑运算,得到第六位图数据;将第六位图数据确定为与约束条件对应的执行结果。(6) The processing module 1002 can be specifically used to determine the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression according to the first bitmap data, expression, and target data; where , The first expression or the second expression is related to any column of data in the target data; based on the target model, the fourth bitmap data and the fifth bitmap data are logically operated to obtain the sixth bitmap data; the sixth The bitmap data is determined as the execution result corresponding to the constraint condition.
其中,本发明实施例中的逻辑运算包括下述中的至少一种:与门、或门、非门。Wherein, the logic operation in the embodiment of the present invention includes at least one of the following: an AND gate, an OR gate, and a NOT gate.
另外,在另一种可能的实施例中,处理模块1002具体可以用于,将表达式和目标数据输入到目标模型中;In addition, in another possible embodiment, the processing module 1002 may be specifically used to input expressions and target data into the target model;
基于目标模型,利用单指令多数据流指令集,对表达式以及目标数据进行并行化处理,得到与约束条件对应的执行结果。Based on the target model, using a single instruction multiple data stream instruction set, the expression and target data are processed in parallel, and the execution result corresponding to the constraint condition is obtained.
需要提示的是,本发明实施例中涉及的列式数据库中的数据为按列存储,每一列单独存放至少一种属性的数据。What needs to be reminded is that the data in the column database involved in the embodiment of the present invention is stored in columns, and each column stores data of at least one attribute separately.
综上,本发明实施例中,通过获取与数据查询请求中表达式对应的目标模型,其中,表达式包含约束条件;将表达式和列式数据库中的目标数据输入到目标模型中,得到与约束条件对应的执行结果。这里,本发明实施例充分利用了列式数据库可以快速获取某一列数据的特性,在表达式运算中减少了写入或者输出的次数,使得获取数据的时间大大减少,减少计算量,提高数据处理效率,以提高SQL语句的数据查询速度。In summary, in the embodiment of the present invention, the target model corresponding to the expression in the data query request is obtained, where the expression contains constraint conditions; the expression and the target data in the columnar database are input into the target model to obtain The execution result corresponding to the constraint condition. Here, the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, and reduces the number of writing or outputting in expression calculations, so that the time for obtaining data is greatly reduced, the amount of calculation is reduced, and the data processing is improved. Efficiency to improve the data query speed of SQL statements.
另外,通过在处理的过程中使用传入的过滤mask即优先过滤掉一些未满足约束条件的数据,可以加快大量减少表达式运算的数量。同时,通过按批计算的方式,可以充分利用SIMD指令集进行表达式运算,加快了运算速度。In addition, by using the incoming filter mask in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be greatly reduced. At the same time, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.
图11示出根据一个实施例的计算设备的结构示意图。Fig. 11 shows a schematic structural diagram of a computing device according to an embodiment.
如图11所示,能够实现根据本发明实施例数据处理方法以及数据处理装置的计算设备的示例性硬件架构的结构图。As shown in FIG. 11, a structural diagram of an exemplary hardware architecture of a computing device capable of implementing a data processing method and a data processing apparatus according to an embodiment of the present invention.
该设备可以包括处理器1101以及存储有计算机程序指令的存储器1102。The device may include a processor 1101 and a memory 1102 storing computer program instructions.
具体地,上述处理器1101可以包括中央处理器(CPU),或者特定集成电路(application specific integrated circuit,ASIC),或者可以被配置成实施本申请实施例的一个或多个集成电路。Specifically, the aforementioned processor 1101 may include a central processing unit (CPU), or an application specific integrated circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
存储器1102可以包括用于数据或指令的大容量存储器。举例来说而非限制,存储器1102可包括硬盘驱动器(hard disk drive,HDD)、软盘驱动器、闪存、光盘、磁光盘、磁带或通用串行总线(universal serial bus,USB)驱动器或者两个及其以上这些的组合。 在合适的情况下,存储器1102可包括可移除或不可移除(或固定)的介质。在合适的情况下,存储器1102可在综合网关设备的内部或外部。在特定实施例中,存储器1102是非易失性固态存储器。在特定实施例中,存储器1102包括只读存储器(ROM)。在合适的情况下,该ROM可以是掩模编程的ROM、可编程ROM(PROM)、可擦除PROM(EPROM)、电可擦除PROM(EEPROM)、电可改写ROM(EAROM)或闪存,或者两个或及其以上这些的组合。The memory 1102 may include mass storage for data or instructions. For example and not limitation, the memory 1102 may include a hard disk drive (HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape or a universal serial bus (USB) drive or two and A combination of these. Where appropriate, the storage 1102 may include removable or non-removable (or fixed) media. Where appropriate, the memory 1102 may be inside or outside the integrated gateway device. In a particular embodiment, the memory 1102 is a non-volatile solid state memory. In a particular embodiment, the memory 1102 includes read-only memory (ROM). Where appropriate, the ROM can be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically rewritable ROM (EAROM) or flash memory, Or a combination of two or more of these.
处理器1101通过读取并执行存储器1102中存储的计算机程序指令,以实现上述实施例中的任意一种数据处理方法。The processor 1101 reads and executes computer program instructions stored in the memory 1102 to implement any data processing method in the foregoing embodiments.
收发器1103,主要用于实现本发明实施例中各装置或者与其他设备中的通信。The transceiver 1103 is mainly used to implement communication between various devices in the embodiment of the present invention or with other devices.
在一个示例中,该设备还可包括总线1104。其中,如图11所示,处理器1101、存储器1102和收发器1103通过总线1104连接并完成相互间的通信。In an example, the device may also include a bus 1104. Wherein, as shown in FIG. 11, the processor 1101, the memory 1102, and the transceiver 1103 are connected through a bus 1104 and complete mutual communication.
总线1104包括硬件、软件或两者。举例来说而非限制,总线可包括加速图形端口(AGP)或其他图形总线、增强工业标准架构(EISA)总线、前端总线(FSB)、超传输(HT)互连、工业标准架构(ISA)总线、无限带宽互连、低引脚数(LPC)总线、存储器总线、微信道架构(MCA)总线、外围控件互连(PCI)总线、PCI-Express(PCI-X)总线、串行高级技术附件(SATA)总线、视频电子标准协会局部(VLB)总线或其他合适的总线或者两个或更多个以上这些的组合。在合适的情况下,总线1103可包括一个或多个总线。尽管本申请实施例描述和示出了特定的总线,但本申请考虑任何合适的总线或互连。The bus 1104 includes hardware, software, or both. For example and not limitation, the bus may include accelerated graphics port (AGP) or other graphics bus, enhanced industry standard architecture (EISA) bus, front side bus (FSB), hypertransport (HT) interconnect, industry standard architecture (ISA) Bus, unlimited bandwidth interconnect, low pin count (LPC) bus, memory bus, microchannel architecture (MCA) bus, peripheral control interconnect (PCI) bus, PCI-Express (PCI-X) bus, serial advanced technology Attachment (SATA) bus, Video Electronics Standards Association Local (VLB) bus or other suitable bus or a combination of two or more of these. Where appropriate, the bus 1103 may include one or more buses. Although the embodiments of this application describe and show a specific bus, this application considers any suitable bus or interconnection.
另外,本发明实施例还提供了与上述数据处理方法对应的计算机可读存储介质。在一种可能的实施例中,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,当计算机程序在计算机中执行时,令计算机执行本发明实施例中涉及的数据处理方法。In addition, the embodiment of the present invention also provides a computer-readable storage medium corresponding to the foregoing data processing method. In a possible embodiment, the embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed in the computer, the computer is caused to perform the data processing involved in the embodiment of the present invention. method.
需要明确的是,本发明并不局限于上文实施例中所描述并在图中示出的特定配置和处理。为了描述的方便和简洁,这里省略了对已知方法的详细描述,并且上述描述的系统、模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。It should be clear that the present invention is not limited to the specific configuration and processing described in the above embodiments and shown in the figure. For the convenience and brevity of the description, detailed descriptions of known methods are omitted here, and the specific working processes of the systems, modules, and units described above can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
本领域的技术人员可以清楚地了解到,本发明的方法过程并不限于所描述和示出的具体步骤,任何熟悉本技术领域的技术人员在领会本发明的精神后,在本发明揭露的技术范围内作出各种改变、修改和添加,或者等效替换以及改变步骤之间的顺序,这些修 改或替换都应涵盖在本发明的保护范围之内。Those skilled in the art can clearly understand that the method process of the present invention is not limited to the specific steps described and shown. Anyone skilled in the art can understand the spirit of the present invention and use the technology disclosed in the present invention. Various changes, modifications and additions, or equivalent substitutions and changes in the order between steps are made within the scope, and these modifications or substitutions should all be covered by the protection scope of the present invention.

Claims (15)

  1. 一种数据处理方法,包括:A data processing method, including:
    获取与数据查询请求中表达式对应的目标模型,所述表达式包含约束条件;Obtaining a target model corresponding to an expression in the data query request, where the expression includes constraint conditions;
    将所述表达式和列式数据库中的目标数据输入到所述目标模型中,得到与所述约束条件对应的执行结果;其中,所述目标数据包括至少一列的全部数据以及所述至少一列中的部分数据;Input the expression and the target data in the column database into the target model to obtain the execution result corresponding to the constraint condition; wherein, the target data includes all the data in at least one column and the data in the at least one column Part of the data;
    输出所述执行结果。Output the execution result.
  2. 根据权利要求1所述的方法,其中,将所述表达式和列式数据库中的目标数据输入到所述目标模型中,得到与所述约束条件对应的执行结果,包括:The method according to claim 1, wherein inputting the expression and target data in the columnar database into the target model to obtain an execution result corresponding to the constraint condition comprises:
    将第一位图数据、所述表达式和所述目标数据输入到所述目标模型中;Input the first bitmap data, the expression and the target data into the target model;
    利用所述第一位图数据过滤所述目标数据,得到满足所述约束条件的数据;Filter the target data by using the first bitmap data to obtain data that meets the constraint condition;
    基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果。Based on the target model and the first bitmap data, the data satisfying the constraint condition is processed to obtain an execution result corresponding to the constraint condition.
  3. 根据权利要求2所述的方法,其中,所述目标数据包括第一列数据和第二列数据,所述第一列数据和所述第二列数据具有相同行数;3. The method according to claim 2, wherein the target data includes a first column of data and a second column of data, and the first column of data and the second column of data have the same number of rows;
    基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果,包括:Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:
    基于所述目标模型,将具有相同行的所述第一列数据和所述第二列数据进行相加,得到第三列数据;其中,所述第三列数据与所述第一列数据具有相同行数;Based on the target model, the first column of data and the second column of data with the same row are added to obtain a third column of data; wherein, the third column of data and the first column of data have The same number of rows;
    将所述第一位图数据和所述第三列数据确定为与所述约束条件对应的执行结果。The first bitmap data and the third column data are determined as execution results corresponding to the constraint condition.
  4. 根据权利要求2所述的方法,其中,所述目标数据包括第四列数据和常量数据,所述第四列数据和所述常量数据具有相同行数;The method according to claim 2, wherein the target data includes a fourth column of data and constant data, and the fourth column of data and the constant data have the same number of rows;
    基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果,包括:Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:
    在所述约束条件为选取所述第四列数据中大于所述常量数据的数据时,基于所述目标模型,将具有相同行数的所述第四列数据和所述常量数据进行比较,得到与所述约束条件对应的执行结果;When the constraint condition is to select data larger than the constant data in the fourth column of data, based on the target model, compare the fourth column of data with the same number of rows and the constant data to obtain The execution result corresponding to the constraint condition;
    其中,所述执行结果包括第二位图数据,所述第二位图数据包括在所述第一位图数据上标记具有相同行数的所述第四列数据大于所述常量数据的数据所在行对应的可见行,所述第二位图数据与所述第四列数据具有相同的行数。Wherein, the execution result includes second bitmap data, and the second bitmap data includes marking on the first bitmap data where the fourth column of data having the same number of rows is greater than the constant data. The row corresponds to the visible row, and the second bitmap data and the fourth column data have the same number of rows.
  5. 根据权利要求2所述的方法,其中,所述目标模型包括第一模型和第二模型;所述目标数据包括第一目标数据和第二目标数据,所述第一目标数据包括第五列数据和第六列数据,所述第五列数据和所述第六列数据具有相同行数;The method according to claim 2, wherein the target model includes a first model and a second model; the target data includes first target data and second target data, and the first target data includes a fifth column of data And the sixth column of data, the fifth column of data and the sixth column of data have the same number of rows;
    基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果,包括:Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:
    基于所述第一模型,将具有相同行的所述第五列数据和所述第六列数据进行相加,得到第七列数据;其中,所述第七列数据与所述第五列数据具有相同行数;Based on the first model, the fifth column of data and the sixth column of data with the same row are added to obtain a seventh column of data; wherein, the seventh column of data is the same as the fifth column of data Have the same number of rows;
    将所述第一位图数据、所述第七列数据和所述第二目标数据输入到所述第二模型中;其中,所述第二目标数据包括第八列数据,所述第八列数据和所述七列数据具有相同行数;Input the first bitmap data, the seventh column data, and the second target data into the second model; wherein, the second target data includes an eighth column of data, and the eighth column The data and the seven columns of data have the same number of rows;
    在所述约束条件为选取所述第七列数据中大于所述第八列数据的数据时,基于所述第二模型,将具有相同行数的所述第七列数据和所述第八列数据进行比较,得到与所述约束条件对应的执行结果;When the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, the seventh column of data and the eighth column of data with the same number of rows are selected The data is compared, and the execution result corresponding to the constraint condition is obtained;
    其中,所述执行结果包括第三位图数据,所述第三位图数据包括在所述第一位图数据上标记具有相同行数的所述第七列数据大于所述第八列数据的数据所在行对应的可见行,所述第三位图数据与所述第七列数据具有相同的行数。Wherein, the execution result includes third bitmap data, and the third bitmap data includes marking on the first bitmap data that the seventh column of data having the same number of rows is greater than the eighth column of data. The visible row corresponding to the row where the data is located, and the third bitmap data has the same number of rows as the seventh column data.
  6. 根据权利要求2所述的方法,其中,所述目标数据包括第九列数据,所述第一位图数据包括隐藏行和可见行,所述第一位图数据和所述第九列数据具有相同行数;The method according to claim 2, wherein the target data includes a ninth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the ninth column data have The same number of rows;
    基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果,包括:Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:
    基于所述目标模型,利用所述隐藏行过滤所述第九列数据中的隐藏行,得到所述第九列数据中与所述可见行具有相同行的数据;Based on the target model, use the hidden rows to filter hidden rows in the ninth column of data to obtain data in the ninth column of data that has the same row as the visible row;
    在所述表达式为数据求和时,将所述可见行具有相同行的数据进行累加,得到累加数据;When the expression is data summation, accumulate the data of the same row in the visible rows to obtain accumulated data;
    将所述累加数据和所述第一位图数据确定为与所述约束条件对应的执行结果。The accumulated data and the first bitmap data are determined as the execution result corresponding to the constraint condition.
  7. 根据权利要求2所述的方法,其中,所述目标数据包括第十列数据,所述第一位图数据包括隐藏行和可见行,所述第一位图数据和所述第十列数据具有相同行数;The method according to claim 2, wherein the target data includes a tenth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the tenth column data have The same number of rows;
    基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果,包括:Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:
    基于所述目标模型,利用所述隐藏行过滤所述第十列数据中的隐藏行,得到所述第 十列数据中与所述可见行具有相同行的数据;Based on the target model, use the hidden rows to filter hidden rows in the tenth column of data to obtain data in the tenth column of data that has the same row as the visible row;
    根据所述目标模型中的第一预设条件,对所述第十列数据中与所述可见行具有相同行的数据进行标记,得到被标记后的第十一列数据;Marking the data in the tenth column of data that has the same row as the visible row according to the first preset condition in the target model to obtain the marked eleventh column of data;
    将所述第一位图数据和所述第十一列数据确定为所述约束条件对应的执行结果。The first bitmap data and the eleventh column of data are determined as execution results corresponding to the constraint condition.
  8. 根据权利要求7所述的方法,其中,在所述目标模型中包括所述第一预设条件和第二预设条件时;所述可见行包括第一可见行和第二可见行;8. The method according to claim 7, wherein, when the target model includes the first preset condition and the second preset condition; the visible line includes a first visible line and a second visible line;
    根据所述目标模型中的第一预设条件,对所述第十列数据中与所述可见行具有相同行的数据进行标记,得到被标记后的第十一列数据,包括:According to the first preset condition in the target model, mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data, including:
    根据所述第一预设条件,对所述第十列数据中与所述第一可见行具有相同行的数据进行标记,且将所述第一可见行调整为隐藏行;Mark the data in the tenth column of data that has the same row as the first visible row according to the first preset condition, and adjust the first visible row to a hidden row;
    根据所述第二预设条件,对所述第十列数据中与所述第二可见行具有相同行的数据进行标记,直至满足第三预设条件,结束标记,得到被标记后的第十一列数据。According to the second preset condition, mark the data in the tenth column of data that has the same row as the second visible row until the third preset condition is met, the mark is ended, and the marked tenth column is obtained. One column of data.
  9. 根据权利要求2所述的方法,其中,基于所述目标模型和所述第一位图数据,对所述满足所述约束条件的数据进行处理,得到与所述约束条件对应的执行结果,包括:The method according to claim 2, wherein, based on the target model and the first bitmap data, processing the data that satisfies the constraint condition to obtain an execution result corresponding to the constraint condition includes :
    根据所述第一位图数据、所述表达式和所述目标数据,确定第一表达式对应的第四位图数据和第二表达式对应的第五位图数据;其中,所述第一表达式或者所述第二表达式与所述目标数据中的任意一列数据相关;According to the first bitmap data, the expression, and the target data, the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression are determined; wherein, the first The expression or the second expression is related to any column of data in the target data;
    基于所述目标模型,对所述第四位图数据和所述第五位图数据进行逻辑运算,得到第六位图数据;Performing logical operations on the fourth bitmap data and the fifth bitmap data based on the target model to obtain sixth bitmap data;
    将所述第六位图数据确定为与所述约束条件对应的执行结果。The sixth bitmap data is determined as the execution result corresponding to the constraint condition.
  10. 根据权利要求9所述的方法,其中,所述逻辑运算包括下述中的至少一种:与门、或门、非门。The method according to claim 9, wherein the logic operation includes at least one of the following: an AND gate, an OR gate, and a NOT gate.
  11. 根据权利要求1所述的方法,其中,将所述表达式和列式数据库中的目标数据输入到所述目标模型中,得到与所述约束条件对应的执行结果,包括:The method according to claim 1, wherein inputting the expression and target data in the columnar database into the target model to obtain an execution result corresponding to the constraint condition comprises:
    将所述表达式和所述目标数据输入到所述目标模型中;Input the expression and the target data into the target model;
    基于所述目标模型,利用单指令多数据流指令集,对所述表达式以及所述目标数据进行并行化处理,得到与所述约束条件对应的执行结果。Based on the target model, a single instruction multiple data stream instruction set is used to perform parallel processing on the expression and the target data to obtain an execution result corresponding to the constraint condition.
  12. 根据权利要求1所述的方法,其中,所述列式数据库中的数据为按列存储,每一列单独存放至少一种属性的数据。The method according to claim 1, wherein the data in the columnar database is stored in columns, and each column separately stores data of at least one attribute.
  13. 一种数据处理装置,包括:A data processing device includes:
    获取模块,用于获取与数据查询请求中表达式对应的目标模型,所述表达式包含约束条件;The obtaining module is used to obtain the target model corresponding to the expression in the data query request, where the expression includes constraint conditions;
    处理模块,用于将所述表达式和列式数据库中的目标数据输入到所述目标模型中,得到与所述约束条件对应的执行结果;其中,所述目标数据包括至少一列的全部数据以及所述至少一列中的部分数据;The processing module is used to input the expression and the target data in the columnar database into the target model to obtain the execution result corresponding to the constraint condition; wherein, the target data includes all data in at least one column and Part of the data in the at least one column;
    输出模块,用于输出所述执行结果。The output module is used to output the execution result.
  14. 一种计算设备,其中,所述设备包括至少一个处理器和存储器,所述存储器用于存储有计算机程序指令,所述处理器用于执行所述存储器的所述程序,以控制所述计算机设备实现如权利要求1-12中的任意一种数据处理方法。A computing device, wherein the device includes at least one processor and a memory, the memory is used to store computer program instructions, and the processor is used to execute the program in the memory to control the computer device to implement Such as any one of the data processing methods in claims 1-12.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,若所述计算机程序在计算机中执行,则令计算机执行如权利要求1-12中的任意一种数据处理方法。A computer-readable storage medium with a computer program stored thereon, wherein, if the computer program is executed in a computer, the computer is caused to execute any one of the data processing methods in claims 1-12.
PCT/CN2020/134785 2019-12-10 2020-12-09 Data processing method, apparatus and device, and storage medium WO2021115304A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911259423.4 2019-12-10
CN201911259423.4A CN112948413A (en) 2019-12-10 2019-12-10 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021115304A1 true WO2021115304A1 (en) 2021-06-17

Family

ID=76225601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134785 WO2021115304A1 (en) 2019-12-10 2020-12-09 Data processing method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112948413A (en)
WO (1) WO2021115304A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323867A1 (en) * 2011-05-26 2012-12-20 International Business Machines Corporation Systems and methods for querying column oriented databases
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment
CN108140022A (en) * 2015-12-24 2018-06-08 华为技术有限公司 Data query method and Database Systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323867A1 (en) * 2011-05-26 2012-12-20 International Business Machines Corporation Systems and methods for querying column oriented databases
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment
CN108140022A (en) * 2015-12-24 2018-06-08 华为技术有限公司 Data query method and Database Systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIXIN_33709609: "MonetDB: Research on Column-Oriented Database Architectures", CSDN BLOG, XP009528401, Retrieved from the Internet <URL:https://blog.csdn.net/weixin_33709609/article/details/89780929> *

Also Published As

Publication number Publication date
CN112948413A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
TWI486810B (en) Counter operation in a state machine lattice
WO2015172533A1 (en) Database query method and server
TWI515668B (en) Methods and systems for detection in a state machine
US9002825B2 (en) Estimating rows returned by recursive queries using fanout
AU2014203218B2 (en) Memory configuration for inter-processor communication in an MPSoC
US8205178B2 (en) Common clock path pessimism analysis for circuit designs using clock tree networks
WO2016078592A1 (en) Bulk data query method and apparatus
JP2015532749A (en) Aggregation / grouping operation: Hardware implementation of filtering method
WO2019085709A1 (en) Pooling method and system applied to convolutional neural network
CN104102549B (en) A kind of method, apparatus and chip for realizing multithreading mutually exclusive operation
WO2022166294A1 (en) Target detection method and apparatus
EP2858024A1 (en) An asset management device and method in a hardware platform
US11132363B2 (en) Distributed computing framework and distributed computing method
US11288266B2 (en) Candidate projection enumeration based query response generation
EP3955256A1 (en) Non-redundant gene clustering method and system, and electronic device
US9015643B2 (en) System, method, and computer program product for applying a callback function to data values
US8321846B2 (en) Executable template
CN113283351B (en) Video plagiarism detection method using CNN optimization similarity matrix
WO2021115304A1 (en) Data processing method, apparatus and device, and storage medium
CN112243509A (en) System and method for generating data sets from heterogeneous sources for machine learning
CN116502273A (en) Dynamic data desensitization method, device and equipment based on data blood edges
CN113034343B (en) Parameter-adaptive hyperspectral image classification GPU parallel method
WO2015062035A1 (en) Columnar database processing method and device
CN112528082B (en) XML document production line XPath query method, terminal equipment and storage medium
US9483332B2 (en) Event processing method in stream processing system and stream processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899488

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20899488

Country of ref document: EP

Kind code of ref document: A1