WO2021115304A1

WO2021115304A1 - Data processing method, apparatus and device, and storage medium

Info

Publication number: WO2021115304A1
Application number: PCT/CN2020/134785
Authority: WO
Inventors: 阮羽彬; 吴迪; 缪哲语; 李猛; 梁宇坤
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-12-10
Filing date: 2020-12-09
Publication date: 2021-06-17
Also published as: CN112948413A

Abstract

A data processing method, apparatus and device, and a storage medium, wherein the method may comprise: first, obtaining a target model corresponding to an expression in a data query request, the expression comprising a constraint condition (210); next, inputting the expression and target data in a column-oriented database into the target model to obtain an execution result corresponding to the constraint condition (220), wherein the target data comprise all data in at least one column and some data in at least one column; and then, outputting the execution result (230). The present invention is used for solving the problem in the related art of low data query speed of SQL statements due to low data processing efficiency.

Description

Data processing method, device, equipment and storage medium

This application claims the priority of the Chinese patent application with the application number 201911259423.4 and the invention title "Data processing method, device, equipment and storage medium" filed on December 10, 2019, the entire content of which is incorporated into this application by reference.

Technical field

The present invention relates to the field of communication technology, in particular to a data processing method, device, equipment and storage medium.

Background technique

When users use structured query language (Structured Query Language, SQL) statements to query data, they can query data through the iterator model. In the iterator model, the expression takes an abstract iterator as input, and obtains the data of each data table in the database row by row through the iterator, so as to calculate the result of each row, and output the result of multiple rows as output. To obtain the corresponding data.

However, this abstract iterator will bring performance loss, because in the iterative process of the iterator, the acquisition of each row of data will cause multiple layers of function calls, and at the same time, the acquisition of data row by row will bring about More writes and outputs consume more resources, resulting in excessive calculations, thereby reducing data processing efficiency and reducing the data query speed of SQL statements.

Summary of the invention

One or more embodiments of the present invention describe a data processing method, device, device, and storage medium to solve the problem of low data processing efficiency in related technologies, which affects the slow data query speed of SQL statements.

In order to solve the above technical problems, the present invention is implemented as follows:

According to the first aspect, a data processing method is provided, and the method may include:

Get the target model corresponding to the expression in the data query request, and the expression contains constraint conditions;

Input the expression and the target data in the columnar database into the target model to obtain the execution result corresponding to the constraint condition; wherein the target data includes all the data in at least one column and part of the data in at least one column;

Output the execution result.

According to a second aspect, there is provided a data processing device, which may include:

The obtaining module is used to obtain the target model corresponding to the expression in the data query request, and the expression includes constraint conditions;

The processing module is used to input expressions and target data in the columnar database into the target model to obtain execution results corresponding to the constraint conditions; wherein the target data includes all data in at least one column and part of data in at least one column;

The output module is used to output the execution result.

According to a third aspect, a computing device is provided. The device includes at least one processor and a memory, the memory is used to store computer program instructions, and the processor is used to execute a program in the memory to control the computing device to implement the method shown in the first aspect. Data processing method.

According to a fourth aspect, there is provided a computer-readable storage medium on which a computer program is stored. If the computer program is executed in a computer, the computer is caused to execute the data processing method shown in the first aspect.

In the embodiment of the present invention, the target model corresponding to the expression in the data query request is obtained, where the expression contains the constraint condition; the expression and the target data in the columnar database are input into the target model to obtain the target model corresponding to the constraint condition The results of the implementation. Here, the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, reduces the number of writes or outputs in expression calculations, greatly reduces the time to obtain data, reduces the amount of calculation, and improves data processing. Efficiency to improve the data query speed of SQL statements.

Description of the drawings

The present invention can be better understood from the following description of the specific embodiments of the present invention in conjunction with the accompanying drawings, wherein the same or similar reference signs indicate the same or similar features.

Fig. 1 shows a schematic structural diagram of a data processing method according to an embodiment;

Fig. 2 shows a flowchart of a data processing method according to an embodiment;

Fig. 3 shows a flowchart of a method for realizing data processing by a first target model according to an embodiment;

Fig. 4 shows a flowchart of a method for realizing data processing by a second target model according to an embodiment;

Figure 5 shows a flow chart of a method for realizing data processing by a first target model and a second target model according to an embodiment;

Fig. 6 shows a flow chart of a method for realizing data processing by a third target model according to an embodiment;

Fig. 7 shows a flowchart of a method for realizing data processing by a fourth target model according to an embodiment;

Fig. 8 shows a flow chart of a method for implementing data processing by a fifth target model according to an embodiment;

FIG. 9 shows a flowchart of another method for implementing data processing by a fifth target model according to an embodiment;

Fig. 10 shows a structural block diagram of a data processing device according to an embodiment;

Fig. 11 shows a schematic structural diagram of a computing device according to an embodiment.

Detailed ways

The features and exemplary embodiments of each aspect of the present invention will be described in detail below. In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only configured to explain the present invention, but not configured to limit the present invention. For those skilled in the art, the present invention can be implemented without some of these specific details. The following description of the embodiments is only to provide a better understanding of the present invention by showing examples of the present invention.

It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any relationship or sequence of such measurements. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the elements defined by the sentence "including..." do not exclude the existence of other identical elements in the process, method, article, or equipment that includes the elements.

At present, in the iterator model, expressions take an abstract iterator as input and calculation results as output. In the process of using iterators to calculate expressions, the data in the data table in the row database is processed row by row. For each row in the table, you can use an iterator to get the data of this row, perform calculations, and then write back the result. After processing one row of data, the iterator will be moved to the next row to perform the same calculation. This kind of operation often forms a loop until the iterator sweeps through all the rows, at which point the loop terminates, the expression calculation jumps to the next model, and then a similar loop is performed on this model. Since the iterator abstracts the entire data table, the abstract iterator will make multi-layer function calls when acquiring data, which reduces the overall processing performance. In addition, each movement of the iterator involves an operation of fetching data. This operation often involves too much writing and output, which brings very large overheads and excessive calculations, thereby reducing data processing efficiency and reducing SQL Data query speed of the statement.

In order to solve the above technical problems, embodiments of the present invention provide a data processing method device, device, and storage medium, which are specifically as follows.

First, a data processing architecture provided by an embodiment of the present invention will be described.

As shown in Figure 1, the architecture can include target nodes and columnar databases. When the target node receives the data query request, it determines the target model corresponding to the expression in the data query request, and the expression contains constraint conditions; then, enters the expression and the target data in the columnar database into the target model, and obtains and The execution result corresponding to the constraint condition; wherein the target data includes all the data in at least one column and part of the data in at least one column; the execution result is output.

Wherein, the columnar database in the embodiment of the present invention may include a Histore columnar database. Histore columnar database is a columnar database evolved from an open source database. The data of each data table in the database is stored in columns. Therefore, you can quickly get all rows of a column when calculating expressions. data. This makes it naturally have the characteristics of batch processing of data. Therefore, the new target model based on batch processing in the embodiment of the present invention uses this feature to divide the data in the data table into different batches, and each expression uses batch as the unit to process the data to speed up The processing speed of the entire expression.

In addition, the above architecture can be applied to application scenarios where users use SQL statements to query data. Or, in a scenario where expression calculations are performed based on the aforementioned target model.

Thus, by obtaining the target model corresponding to the expression in the data query request; inputting the expression and target data in the columnar database into the target model, the execution result corresponding to the constraint condition of the expression is obtained. Here, the embodiment of the present invention makes full use of the feature of columnar database that can quickly obtain a certain column of data, reduces the number of writes or outputs in expression calculations, greatly reduces the time for obtaining data, reduces the amount of calculations, and improves data. Processing efficiency.

Here, based on the foregoing architecture and application scenarios, the embodiment of the present invention further illustrates the data processing method provided by the embodiment of the present invention with reference to FIGS. 2-9.

Fig. 2 shows a flowchart of a data processing method according to an embodiment.

As shown in Figure 2, the method may include steps 210 to 230: First, step 210, obtain the target model corresponding to the expression in the data query request, the expression contains constraints; second, step 220, combine the expression and column The target data in the formula database is input into the target model, and the execution result corresponding to the constraint condition is obtained; then, in step 230, the execution result is output.

The above steps are described in detail below:

First, referring to step 210, the target node receives a data query request;

Wherein, the data query request includes an expression, and the expression can be expressed as "to count the number of clicks on a certain product", "to query the number of visits to a certain platform", and so on. The expression may include at least one constraint condition. Therefore, the above expression "statistics of clicks on a certain product" may include the following constraints: "statistics of clicks on a certain product from January to June" and "statistics of clicks on a certain product" "Data that the number of clicks for a product is within 1-1000" and so on.

After receiving the data query request, the method provided by the embodiment of the present invention may further include: determining the target model corresponding to the expression in the data query request according to the data query request.

Here, each expression has its corresponding target model. In the embodiment of the present invention, six target models are provided, which will be described in detail below in conjunction with step 220.

Secondly, referring to step 220, the target data in the embodiment of the present invention includes all the data in at least one column and part of the data in at least one column. In addition, the data in the columnar database is stored in columns, and each column stores data of at least one attribute separately.

Based on this, in a possible embodiment, this step may specifically include: inputting the expression and target data into the target model;

Based on the target model, using the Single Instruction Multiple Data (SIMD) instruction set, the expression and target data are parallelized to obtain the execution result corresponding to the constraint condition.

As a result, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.

In another possible embodiment, this step may specifically include inputting the first bitmap data, expression, and target data into the target model;

Use the first bitmap data to filter the target data to obtain data that meets the constraint conditions;

Based on the target model and the first bitmap data, the data that meets the constraint conditions are processed, and the execution result corresponding to the constraint conditions is obtained.

Therefore, by using the incoming filtered first bitmap data (mask) in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be reduced.

What needs to be reminded is that the above-mentioned second possible embodiment can be implemented based on the first possible embodiment, that is, the first bitmap data, expression, and target data are input into the target model;

Based on the target model and the first bitmap data, using the Single Instruction Multiple Data (SIMD) instruction set, the expression and target data are parallelized to obtain the execution result corresponding to the constraint condition.

In this way, the target data can be processed in batches, making full use of the columnar database's ability to quickly obtain a column of data, reducing the number of writing and outputting in expression calculations, and greatly reducing the time for querying data results. At the same time, it is friendly to the cache. These optimizations can bring overall performance improvements to the entire expression. In addition, by using the incoming filter mask in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be greatly reduced. At the same time, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.

Based on the above content, the embodiment of the present invention combines the following at least one target model to detail the above step "based on the target model and the first bitmap data, process the data that meets the constraint condition to obtain the execution result corresponding to the constraint condition" Description.

(1) When the target model is the first target model, the target data includes the first column of data and the second column of data, and the first column of data and the second column of data have the same number of rows, this step may specifically include:

Based on the first target model, add the data in the first column and the data in the second column with the same row to obtain the data in the third column; wherein the data in the third column and the data in the first column have the same number of rows;

The first bitmap data and the third column data are determined as the execution result corresponding to the constraint condition.

Wherein, the input of the first target model may include the first bitmap data, the expression, the first column of data, and the second column of data. Through the above data processing method, the output of the first target model may include the first bitmap data and the third column data.

For example, when the first target model is a summation model, as shown in Figure 3, a tree structure with an expression of a+b and its implementation model are shown. Among them, a and b are any two columns in the target data, that is, the first column of data and the second column of data. Based on this, take the expression "+", mask (bitmap data), a and b as input, add the data of a and b in each row, and then output the result. In the first target model based on batch processing, the expression in the first target model receives a mask for filtering, a batch of data in column a, and a batch of data in column b. The role of the mask is to filter out the rows that have been filtered (that is, rows that do not meet the constraint conditions; and/or, the rows that meet the marked preset conditions), and then add the remaining rows in batches, and finally Output a mask and the result a+b of this batch of data. In this expression, since the input mask is not changed (that is, the mask is not marked to hide 0, and the mask only includes visible lines, that is, marked 1), the input mask can be directly used as the output.

(2) When the target model is the second target model, and the target data includes the fourth column of data and constant data, and the fourth column of data and constant data have the same number of rows, this step may specifically include:

When the constraint condition is to select data larger than the constant data in the fourth column of data, based on the second target model, compare the fourth column of data with the same number of rows with the constant data to obtain the execution result corresponding to the constraint condition;

Among them, the execution result includes the second bitmap data, the second bitmap data includes the visible row corresponding to the row where the fourth column data with the same number of rows is greater than the constant data marked on the first bitmap data, and the second bitmap The data has the same number of rows as the fourth column of data.

The input of the second target model may include first bitmap data, expressions, fourth column data, and constant data. Through the above data processing method, the output of the second target model may include the second bitmap data.

For example, as shown in Figure 4, the tree structure of the fourth column of the expression a>constant data 3 and its implementation model are shown. Based on this, in the second target model based on batch processing, the expression in the second target model receives a batch of data in a mask and column a, and a batch of data with a data volume equal to the constant 3 of a as input. The resulting output mask is based on the input mask. After using the mask to filter out some rows (that is, rows that do not meet the constraints), compare a>3 in batches. If this certain row is less than or equal to 3, reset these rows in the result mask (that is, mark the rows that do not meet the constraints as Hide row 0, mark the row that meets the constraint condition and the comparison result as visible row 1), and finally output a result mask (including the visible row 1 that satisfies the constraint condition and satisfies the comparison result).

(3) When the target models are the first target model and the second target model, the target data includes the first column of data and the second column of data, the target model includes the first model and the second model; the target data includes the first target data and the second column of data. Two target data, the first target data includes the fifth column of data and the sixth column of data, and when the fifth column of data and the sixth column of data have the same number of rows, this step may specifically include:

Based on the first model, add the data in the fifth column and the data in the sixth column with the same row to obtain the data in the seventh column; wherein the data in the seventh column and the data in the fifth column have the same number of rows;

Input the first bitmap data, the seventh column data, and the second target data into the second model; wherein, the second target data includes the eighth column of data, and the eighth column of data and the seventh column of data have the same number of rows;

When the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, compare the seventh column of data with the same number of rows and the eighth column of data to obtain the execution result corresponding to the constraint condition ；

Wherein, the execution result includes the third bitmap data, the third bitmap data includes the visible row corresponding to the row where the seventh column data with the same number of rows is greater than the eighth column data marked on the first bitmap data, and the third The bitmap data has the same number of rows as the seventh column data.

For example, as shown in Figure 5, a combined expression (a+b)>second target data c is shown. This expression is a combination of (1) and (2). The result of mask and (a+b) output in (1) can be used as the input of the expression (a+b)>c. In addition, a batch of data of the same amount of data from c is taken as input, and finally (a+ b)>c operation, get a result mask.

Therefore, based on the first target model and the second target model, the first bitmap data (mask) is filtered by using the incoming filter during processing and the result mask is input to the next node (this node can refer to the second The target model and/or other target nodes) can speed up and greatly reduce the number of expression operations.

(4) When the target model is the third target model, the target data includes the ninth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the ninth column of data have the same number of rows, this step It can include:

Based on the target model, use hidden rows to filter hidden rows in the ninth column of data, and obtain data in the ninth column of data that has the same row as the visible row;

When the expression is the sum of data, the data of the same row in the visible rows are accumulated to obtain the accumulated data;

The accumulated data and the first bitmap data are determined as the execution result corresponding to the constraint condition.

Wherein, the input of the third target model may include the first bitmap data, the expression, and the ninth column of data. Through the above-mentioned data processing method, the output of the third target model may include accumulated data and the first bitmap data.

For example, as shown in Figure 6, the tree structure of an aggregation query (Aggregation) expression SUM and its implementation model are shown. This expression takes a mask for filtering and the ninth column of data a as input. After the row marked as 0 in the mask is used to filter out the row data corresponding to row 0 in the ninth column of data, the remaining data (ie It can be seen that the row corresponding to row 1) is summed, and finally the sum of a mask and this batch of data (sum(a)) is output.

(5) When the target model is the fourth target model, the target data includes the tenth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the tenth column of data have the same number of rows, this step It can include:

Based on the target model, use hidden rows to filter hidden rows in the tenth column of data, and obtain data in the tenth column of data that have the same rows as the visible rows;

According to the first preset condition in the target model, mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data;

Determine the first bitmap data and the eleventh column of data as the execution result corresponding to the constraint condition.

Further, when the first preset condition and the second preset condition are included in the target model; the visible line includes the first visible line and the second visible line;

According to the first preset condition in the target model, mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data, including:

According to the first preset condition, mark the data in the tenth column of data that has the same row as the first visible row, and adjust the first visible row to a hidden row;

According to the second preset condition, the data in the tenth column of data that has the same row as the second visible row is marked until the third preset condition is met, the mark is ended, and the marked eleventh column of data is obtained.

The input of the fourth target model may include the first bitmap data (in this example, the first bitmap data includes hidden row 0 and visible row 1), expressions (carrying 3 constraints), and the first bitmap data. Ten columns of data. Through the above data processing method, the output of the fourth target model may include the first bitmap data and the eleventh column data.

For example, as shown in Figure 7, the tree structure of the conditional function (CASE WHEN) expression and its implementation model are shown. This expression contains three jump branches, namely conditional judgments (when then, default then). In the initial case, use the mask to filter out a part of the data (the row shown in the black square is the hidden row 0). In the result of the first conditional judgment (when a), the rows of the gray squares meet the condition, so those rows are set to condition result 5, and these rows are filtered out (that is, they are marked as hidden rows 0), and the rows are marked as hidden rows 0. It is not considered in a conditional judgment. In the second conditional judgment (when b), the rows where the black squares are located are filtered out first, and then the operation finds that the rows where the gray squares are located satisfy the second conditional judgment, so those rows are set to the condition result 6, and at the same time These rows are filtered out (that is, marked as hidden row 0) and will not be considered in the next conditional filter. In the last condition judgment, all the remaining rows (the rows with the white squares) are considered by default, and these rows are set to the condition result 7. Finally, this expression outputs a mask and all conditional results that meet the constraints, where default represents the choice when all the conditions do not meet.

(6) When the target model is the fifth target model, this step may specifically include:

According to the first bitmap data, expression, and target data, determine the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression; wherein, the first expression or the second expression is the same as Any column of data in the target data is related;

Based on the target model, perform logical operations on the fourth bitmap data and the fifth bitmap data to obtain the sixth bitmap data;

The sixth bitmap data is determined as the execution result corresponding to the constraint condition.

Wherein, the logic operation includes at least one of the following: an AND gate, an OR gate, and a NOT gate.

Here, what needs to be reminded is that the input in the fifth target model may include the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression, where the fourth bitmap data and The method of the fifth bitmap data can be based on a result mask for input a output in (2) above, which can be understood as the fourth bitmap data mask corresponding to the first expression a; in the same way, if the above (2) is passed If the input is b, the output mask for the input b can be understood as the fifth bitmap data mask corresponding to the second expression a.

Based on this, when the logical operation is an AND gate, as shown in Figure 8, the tree structure of the expression containing AND and its implementation model are shown. The expression in the fifth target model receives the result mask of the two sub-expressions as input (the mask output of expression a and the mask output of expression b), and the two input masks are ANDed to obtain the result mask and finally output .

When the logical operation is an OR gate, as shown in Figure 9, the tree structure of the expression containing OR and its implementation model are shown. The expression in the fifth target model receives the result mask of the two sub-expressions as input (the mask output of expression a and the mask output of expression b), and performs an OR operation on the two input masks to obtain the result mask and finally output.

Therefore, based on the second target model and the fifth target model, by using the incoming filter first bitmap data (mask) in the process of processing and inputting the result mask to the next node, it can speed up a large number of expression calculations. quantity.

To sum up, in the embodiment of the present invention, the target model corresponding to the expression in the data query request is obtained, where the expression contains constraint conditions; the expression and the target data in the columnar database are input into the target model to obtain The execution result corresponding to the constraint condition. Here, the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, and reduces the number of writing or outputting in expression calculations, so that the time for obtaining data is greatly reduced, the amount of calculation is reduced, and the data processing is improved. Efficiency to improve the data query speed of SQL statements.

In addition, by using the incoming filter mask in the process of processing, that is, to preferentially filter out some data that does not meet the constraint conditions, the number of expression operations can be greatly reduced. At the same time, by means of batch calculation, the SIMD instruction set can be fully utilized to perform expression calculations, which speeds up calculations.

Based on the foregoing data processing method, an embodiment of the present invention provides a data processing device. Fig. 10 shows a structural block diagram of a data processing device according to an embodiment.

As shown in FIG. 10, the device 1000 may specifically include:

The obtaining module 1001 is used to obtain the target model corresponding to the expression in the data query request, and the expression includes constraint conditions;

The processing module 1002 is used to input expressions and target data in the columnar database into the target model to obtain execution results corresponding to the constraint conditions; wherein the target data includes all data in at least one column and part of data in at least one column;

The output module 1003 is used to output execution results.

In a possible embodiment, the processing module 1002 may be specifically used to input the first bitmap data, expressions, and target data into the target model;

Further, the embodiment of the present invention provides six possible target models for detailed description.

(1) When the target data includes the first column of data and the second column of data, and the first column of data and the second column of data have the same number of rows, the processing module 1002 can be specifically used to, based on the target model, set the first column of data with the same row One column of data and the second column of data are added to obtain the third column of data; among them, the third column of data has the same number of rows as the first column of data; the first bitmap data and the third column of data are determined to be the same as the constraint condition The corresponding execution result.

(2) When the target data includes the fourth column of data and constant data, and the fourth column of data and constant data have the same number of rows, the processing module 1002 can be specifically used to select the fourth column of data greater than the constant data. When data, based on the target model, compare the fourth column of data with the same number of rows with constant data to obtain the execution result corresponding to the constraint condition;

(3) The target model includes the first model and the second model; the target data includes the first target data and the second target data, the first target data includes the fifth column of data and the sixth column of data, and the fifth column of data and the sixth column of data. When the column data has the same number of rows, the processing module 1002 can be specifically used to add the fifth column data and the sixth column data with the same row based on the first model to obtain the seventh column data; among them, the seventh column The data has the same number of rows as the fifth column of data;

The execution result includes the third bitmap data, the third bitmap data includes the visible row corresponding to the row where the seventh column data with the same number of rows is greater than the eighth column data marked on the first bitmap data, and the third The bitmap data has the same number of rows as the seventh column data.

(4) When the target data includes the ninth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the ninth column of data have the same number of rows, the processing module 1002 can be specifically used to, based on the target The model uses hidden rows to filter the hidden rows in the ninth column of data, and obtains the data in the ninth column of data that has the same row as the visible row; when the expression is data summation, the data with the same row in the visible row is accumulated, Obtain the accumulated data; determine the accumulated data and the first bitmap data as the execution result corresponding to the constraint condition.

(5) When the target data includes the tenth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the tenth column of data have the same number of rows, the processing module 1002 can be specifically used to, based on the target The model uses hidden rows to filter the hidden rows in the tenth column of data to obtain the data in the tenth column of data that has the same row as the visible row; according to the first preset condition in the target model, compare the tenth column of data with the visible row The data with the same row is marked, and the marked eleventh column data is obtained; the first bitmap data and the eleventh column data are determined as the execution result corresponding to the constraint condition.

Further, when the target model includes the first preset condition and the second preset condition; when the visible line includes the first visible line and the second visible line, the processing module 1002 can be specifically configured to, according to the first preset condition, Mark the data in the tenth column of data that has the same row as the first visible row, and adjust the first visible row to a hidden row;

(6) The processing module 1002 can be specifically used to determine the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression according to the first bitmap data, expression, and target data; where , The first expression or the second expression is related to any column of data in the target data; based on the target model, the fourth bitmap data and the fifth bitmap data are logically operated to obtain the sixth bitmap data; the sixth The bitmap data is determined as the execution result corresponding to the constraint condition.

Wherein, the logic operation in the embodiment of the present invention includes at least one of the following: an AND gate, an OR gate, and a NOT gate.

In addition, in another possible embodiment, the processing module 1002 may be specifically used to input expressions and target data into the target model;

Based on the target model, using a single instruction multiple data stream instruction set, the expression and target data are processed in parallel, and the execution result corresponding to the constraint condition is obtained.

What needs to be reminded is that the data in the column database involved in the embodiment of the present invention is stored in columns, and each column stores data of at least one attribute separately.

In summary, in the embodiment of the present invention, the target model corresponding to the expression in the data query request is obtained, where the expression contains constraint conditions; the expression and the target data in the columnar database are input into the target model to obtain The execution result corresponding to the constraint condition. Here, the embodiment of the present invention makes full use of the feature that the columnar database can quickly obtain a certain column of data, and reduces the number of writing or outputting in expression calculations, so that the time for obtaining data is greatly reduced, the amount of calculation is reduced, and the data processing is improved. Efficiency to improve the data query speed of SQL statements.

As shown in FIG. 11, a structural diagram of an exemplary hardware architecture of a computing device capable of implementing a data processing method and a data processing apparatus according to an embodiment of the present invention.

The device may include a processor 1101 and a memory 1102 storing computer program instructions.

Specifically, the aforementioned processor 1101 may include a central processing unit (CPU), or an application specific integrated circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.

The memory 1102 may include mass storage for data or instructions. For example and not limitation, the memory 1102 may include a hard disk drive (HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape or a universal serial bus (USB) drive or two and A combination of these. Where appropriate, the storage 1102 may include removable or non-removable (or fixed) media. Where appropriate, the memory 1102 may be inside or outside the integrated gateway device. In a particular embodiment, the memory 1102 is a non-volatile solid state memory. In a particular embodiment, the memory 1102 includes read-only memory (ROM). Where appropriate, the ROM can be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically rewritable ROM (EAROM) or flash memory, Or a combination of two or more of these.

The processor 1101 reads and executes computer program instructions stored in the memory 1102 to implement any data processing method in the foregoing embodiments.

The transceiver 1103 is mainly used to implement communication between various devices in the embodiment of the present invention or with other devices.

In an example, the device may also include a bus 1104. Wherein, as shown in FIG. 11, the processor 1101, the memory 1102, and the transceiver 1103 are connected through a bus 1104 and complete mutual communication.

The bus 1104 includes hardware, software, or both. For example and not limitation, the bus may include accelerated graphics port (AGP) or other graphics bus, enhanced industry standard architecture (EISA) bus, front side bus (FSB), hypertransport (HT) interconnect, industry standard architecture (ISA) Bus, unlimited bandwidth interconnect, low pin count (LPC) bus, memory bus, microchannel architecture (MCA) bus, peripheral control interconnect (PCI) bus, PCI-Express (PCI-X) bus, serial advanced technology Attachment (SATA) bus, Video Electronics Standards Association Local (VLB) bus or other suitable bus or a combination of two or more of these. Where appropriate, the bus 1103 may include one or more buses. Although the embodiments of this application describe and show a specific bus, this application considers any suitable bus or interconnection.

In addition, the embodiment of the present invention also provides a computer-readable storage medium corresponding to the foregoing data processing method. In a possible embodiment, the embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed in the computer, the computer is caused to perform the data processing involved in the embodiment of the present invention. method.

It should be clear that the present invention is not limited to the specific configuration and processing described in the above embodiments and shown in the figure. For the convenience and brevity of the description, detailed descriptions of known methods are omitted here, and the specific working processes of the systems, modules, and units described above can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

Those skilled in the art can clearly understand that the method process of the present invention is not limited to the specific steps described and shown. Anyone skilled in the art can understand the spirit of the present invention and use the technology disclosed in the present invention. Various changes, modifications and additions, or equivalent substitutions and changes in the order between steps are made within the scope, and these modifications or substitutions should all be covered by the protection scope of the present invention.

Claims

A data processing method, including:

Obtaining a target model corresponding to an expression in the data query request, where the expression includes constraint conditions;

Input the expression and the target data in the column database into the target model to obtain the execution result corresponding to the constraint condition; wherein, the target data includes all the data in at least one column and the data in the at least one column Part of the data;

Output the execution result.
The method according to claim 1, wherein inputting the expression and target data in the columnar database into the target model to obtain an execution result corresponding to the constraint condition comprises:

Input the first bitmap data, the expression and the target data into the target model;

Filter the target data by using the first bitmap data to obtain data that meets the constraint condition;

Based on the target model and the first bitmap data, the data satisfying the constraint condition is processed to obtain an execution result corresponding to the constraint condition.
3. The method according to claim 2, wherein the target data includes a first column of data and a second column of data, and the first column of data and the second column of data have the same number of rows;

Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:

Based on the target model, the first column of data and the second column of data with the same row are added to obtain a third column of data; wherein, the third column of data and the first column of data have The same number of rows;

The first bitmap data and the third column data are determined as execution results corresponding to the constraint condition.
The method according to claim 2, wherein the target data includes a fourth column of data and constant data, and the fourth column of data and the constant data have the same number of rows;

Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:

When the constraint condition is to select data larger than the constant data in the fourth column of data, based on the target model, compare the fourth column of data with the same number of rows and the constant data to obtain The execution result corresponding to the constraint condition;

Wherein, the execution result includes second bitmap data, and the second bitmap data includes marking on the first bitmap data where the fourth column of data having the same number of rows is greater than the constant data. The row corresponds to the visible row, and the second bitmap data and the fourth column data have the same number of rows.
The method according to claim 2, wherein the target model includes a first model and a second model; the target data includes first target data and second target data, and the first target data includes a fifth column of data And the sixth column of data, the fifth column of data and the sixth column of data have the same number of rows;

Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:

Based on the first model, the fifth column of data and the sixth column of data with the same row are added to obtain a seventh column of data; wherein, the seventh column of data is the same as the fifth column of data Have the same number of rows;

Input the first bitmap data, the seventh column data, and the second target data into the second model; wherein, the second target data includes an eighth column of data, and the eighth column The data and the seven columns of data have the same number of rows;

When the constraint condition is to select data in the seventh column of data that is greater than the eighth column of data, based on the second model, the seventh column of data and the eighth column of data with the same number of rows are selected The data is compared, and the execution result corresponding to the constraint condition is obtained;

Wherein, the execution result includes third bitmap data, and the third bitmap data includes marking on the first bitmap data that the seventh column of data having the same number of rows is greater than the eighth column of data. The visible row corresponding to the row where the data is located, and the third bitmap data has the same number of rows as the seventh column data.
The method according to claim 2, wherein the target data includes a ninth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the ninth column data have The same number of rows;

Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:

Based on the target model, use the hidden rows to filter hidden rows in the ninth column of data to obtain data in the ninth column of data that has the same row as the visible row;

When the expression is data summation, accumulate the data of the same row in the visible rows to obtain accumulated data;

The accumulated data and the first bitmap data are determined as the execution result corresponding to the constraint condition.
The method according to claim 2, wherein the target data includes a tenth column of data, the first bitmap data includes hidden rows and visible rows, and the first bitmap data and the tenth column data have The same number of rows;

Based on the target model and the first bitmap data, processing the data that meets the constraint condition to obtain an execution result corresponding to the constraint condition includes:

Based on the target model, use the hidden rows to filter hidden rows in the tenth column of data to obtain data in the tenth column of data that has the same row as the visible row;

Marking the data in the tenth column of data that has the same row as the visible row according to the first preset condition in the target model to obtain the marked eleventh column of data;

The first bitmap data and the eleventh column of data are determined as execution results corresponding to the constraint condition.
8. The method according to claim 7, wherein, when the target model includes the first preset condition and the second preset condition; the visible line includes a first visible line and a second visible line;

According to the first preset condition in the target model, mark the data in the tenth column of data that has the same row as the visible row to obtain the marked eleventh column of data, including:

Mark the data in the tenth column of data that has the same row as the first visible row according to the first preset condition, and adjust the first visible row to a hidden row;

According to the second preset condition, mark the data in the tenth column of data that has the same row as the second visible row until the third preset condition is met, the mark is ended, and the marked tenth column is obtained. One column of data.
The method according to claim 2, wherein, based on the target model and the first bitmap data, processing the data that satisfies the constraint condition to obtain an execution result corresponding to the constraint condition includes :

According to the first bitmap data, the expression, and the target data, the fourth bitmap data corresponding to the first expression and the fifth bitmap data corresponding to the second expression are determined; wherein, the first The expression or the second expression is related to any column of data in the target data;

Performing logical operations on the fourth bitmap data and the fifth bitmap data based on the target model to obtain sixth bitmap data;

The sixth bitmap data is determined as the execution result corresponding to the constraint condition.
The method according to claim 9, wherein the logic operation includes at least one of the following: an AND gate, an OR gate, and a NOT gate.
The method according to claim 1, wherein inputting the expression and target data in the columnar database into the target model to obtain an execution result corresponding to the constraint condition comprises:

Input the expression and the target data into the target model;

Based on the target model, a single instruction multiple data stream instruction set is used to perform parallel processing on the expression and the target data to obtain an execution result corresponding to the constraint condition.
The method according to claim 1, wherein the data in the columnar database is stored in columns, and each column separately stores data of at least one attribute.
A data processing device includes:

The obtaining module is used to obtain the target model corresponding to the expression in the data query request, where the expression includes constraint conditions;

The processing module is used to input the expression and the target data in the columnar database into the target model to obtain the execution result corresponding to the constraint condition; wherein, the target data includes all data in at least one column and Part of the data in the at least one column;

The output module is used to output the execution result.
A computing device, wherein the device includes at least one processor and a memory, the memory is used to store computer program instructions, and the processor is used to execute the program in the memory to control the computer device to implement Such as any one of the data processing methods in claims 1-12.
A computer-readable storage medium with a computer program stored thereon, wherein, if the computer program is executed in a computer, the computer is caused to execute any one of the data processing methods in claims 1-12.